UNIVERSITY OF HAWAT 
LIBRARY 


PHILOSOPHICAL 
MAGAZINE 


FIRST PUBLISHED IN 1798 


- 41 SEVENTH SERIES No. 314 MARCH, 1950 


A Journal of 
Theoretical Experimental 


and Applied Physics 


EDITOR 
PROFESSOR N.'F. MOTT, M.A., D.Sc., F.R.S. 


EDITORIAL BOARD 
SIR LAWRENCE BRAGG, 0O.B.E., M.C., M.A., D.Sc., F.R.S. 
ALLAN FERGUSON, M.A., D.Sc. 
SIR GEORGE THOMSON, M.A., D.Sec., F.R.S. 
PROFESSOR A. M. TYNDALL, C.B.E., D.Sc., F.R.S. 


PRICE 10s. 
Annual Subscription £5 2s. 6d. payable in advance. 


TED AND PUBLISHED BY TAYLOR & FRANCIS LTD., RED LION COURT, FLEET STREET, LONDON £.C.4 


a eas. 
ee — D. McKIE, D.Sc., Ph.D., 


University College, London. 
\ HARCOURT BROWN, 
M.A., Ph.D., 
Brown University, Providence, R.L., 
i’e U 
H. W. ROBINSON, 


Former Librarian, 
of Royal Society of London. 


~ e1en « e ANNUAL SUBSCRIPTION 
ee £2 Os. Oa. 


OR 
w OF . 6d. 
5 QUARTERLY REVIE ENCE as te 
sc \ POST FREE 


gincE 7 


| eS a 
_ 
TAYLOR & FRANCIS, LTD., Red Lion Court, Fleet Street, LONDON, E.C.4 


THe MATHEMATICAL WORKS 


OF JOHN WALLIS, D.D.. F.R.S. 
by 
J. I. SCOTT, Ph.D... B.A. 


‘“His work will be indispensable to those interested in the early history of 
The Royal Society. | commend to all students of the Seventeenth Century, 
whether scientific or humane, this learned and lucid book.’’—Extract from 
foreword by Prof. E. N. da C. Andrade, D.Sc., Ph.D., F.R.S. 


Recommended for publication by University of London 


12/6 vet 


Printed and Published by: 


TAYLOR & FRANCIS, LTD. 
RED LION COURT, FLEET STREET, LONDON, E.C.4. 


pe 209m] 


. XIX. Note,or Potentials Derived from Axial Values in Electron Optics. 


By Fropora Berz, L. és Sc., 
Mullard Electronic Research Laboratories +. 


[Received October 31, 1949.] 


SYNOPSIS. 


The determination of electrode systems which produce a precalculated 
distribution along the optic axis is examined. It is shown that an 
analytic potential distribution is completely determined when its values 
along any arbitrarily small interval on the axis are exactly known. On 
the other hand if, as usual, the value of the potential along a finite stretch 
of the axis is prescribed, within arbitrarily small but finite limits, this 
is compatible with an infinity of different potential distributions, which 
can be constructed without any singularity at finite distances, and which 
can assume prescribed values at any number of points outside-the axis 
of symmetry. 


§ 1. INTRODUCTION. 


Tue determination of electrode systems to produce a_ precalculated 
potential distribution along the optic axis is a standard problem of 
electron optics. By a well-known theorem, the potential on the axis 
uniquely determines the distribution outside it. This theorem, though 
very often quoted, is not always properly understood, especially in its 
practical implications. The reason is that in this extrapolation problem 
the difference between the rigorous, mathematical, and the everyday 
or practical concept of a function, which is often overlooked in physical 
problems, has far reaching consequences. , 

In order to make this clear, it will be shown in this paper, first that if 
the concept of function is used in the mathematical sense, the potential 
function in the plane is determined by its values on a stretch smaller 
than any prescribed length (its values along the whole axis being either 
redundant, or leading to non-analytic solutions). Second, it will be shown 
that if the concept of the “ given function ” is used in its practical meaning, 
i.e. if the potential on the axis is known within finite but arbitrarily 
small limits, the potential problem admits an infinity of solutions. 

The chief practical consequence of these results is that in many electron 
optical problems, where the exact realization of a precalculated potential 
distribution on the axis leads to inconvenient or even impossible shapes, 
very small variations of the data, which do not appreciably influence the 
electron trajectories, may result in electrode systems far more convenient 
for practical application. 


+ Communicated by Dr. D. Gabor. 
SER. 7, VOL. 41, NO. 314.—MARCH 1950 R 


2 Oa Feodora Berz on Potentials 


For convenience we shall limit ourselves to plane electrostatic fields, 
with potential distributions symmetrical with respect to the OX axis. 
The case of rotational symmetry is more complicated, but the general 
results obtained can be transferred without alteration, as will be briefly 
summed up in §4, at the end of this paper. 


§ 2. Rigorous MATHEMATICAL SOLUTION—THEOREM OF UNIQUENESS. 


Consider a potential distribution V(x, y) which can be expounded in 
Taylor series in the neighbourhood of any given point, with the possible 
exception of some isolated singularities. Such distribution will be 
called “‘analytic’’, by an extension of the theory of functions of a complex 
variable. 

In the absence of space charge, case which will only be considered here, 
the function V(a, y) must satisfy Laplace’s equation 


ev av 
Ox Or ==((} . . - - . . . . (1) 


Fig. 1. 


/ \ 
Vx aa aay me ; 
8 ue 


oe 


If v(x) is the potential distribution along the OX axis, the solution in the 
whole plane is 


V(x, y)=Rev(a+iy), a rede! oe i 


where Rev(a-+7y) stands for the real part of the function v(a+iy). 
Another well-known formula giving V(x, y) as a power series, and valid 
in the neighbourhood of the OX axis, is 


re ‘os i : (—1 Ngp2n d2"y 
V(u, Y)=e(%)— 9-3 + ay Te ee eet (3) 


Formula (3) can be derived from formula (2), by expanding it in a Taylor 
series. ; 

Theorem of uniqueness: if two potential functions assume the same 
value along any arbitrarily small interval of the axis of symmetry, they are 
identical. 

The proof may be directly obtained from the corresponding theorem 
concerning the analytic functions of a complex variable. 


Derived from Axial Values in Electron Optics 211 


Let V(x, y) and V*(x, y) be two potential distributions which assume 
the same value along a small portion «B of the OX axis (fig. 1), and let 


ad and v*(x-+2y) be the corresponding complex functions defined 
above 


Vig; OY=v(a), V(x, y)=Rev(a+iy), 
, V*(x5,0) ==)" (2), V*(x, y)=Rev*(x+iy). 
Consider now the analytic function 4(x+iy) such that 
b(a-+iy)=v*(e-+iy)—v(u-+iy). 


Expanding ¢(x+7y) in Taylor’s series, with a point I inside «8 taken as 
origin, we have 


ea 22 d2 gn d” : 
p(x+ity)=¢;+ ee ae, ae . (4) 
where 
2=ar-+1y, 


and the subscript 7 refers to values taken at the point I. 

Since ¢(x-+-7y) is an analytic function, the value of its derivatives at 
any given point does not depend on the direction along which this 
derivative is taken. Therefore 


dd; 04; A", 0"; 


ee One ase ous 
On the other hand ¢(z) is equal to zero along «8 by definition. Hence 
its derivatives with respect to x at the point I, and therefore also all 
the terms of the Taylor series (4), will vanish. Thus ¢(x+7y)=0 inside 
the circle of convergence y of the series (4), whose centre is I (fig. 1). 
Using analytic continuation it could similarly be shown that 4(a+iy)=0 
everywhere. Hence 


v*(a+1y)=v(@+1y) 
and 
V*(x, yJ=V(x, y). 


Thus an analytic potential distribution is completely determined by 
its values along an infinitesimal portion of the axis of symmetry. 

It has to be noted that formule (2) (3) tend to give the false impression. 
that the potential distribution along the whole axis OX can be chosen: 
in an arbitrary way. If v(v+iy) is analytic, its values along the whole 
axis OX are implicitly defined by its values along an infinitesimal portion 
of this axis. On the other hand if v(v+7y) is not analytic along the axis 
OX taken as a whole, V(x, y) as defined by (2) or (3) will not be analytic 
in the neighbourhood of the OX axis. 

As an example consider the function V(x, 0) defined by 


V(z, 0)=—cosa% for —m7<v<7, 
V(u, 0)=1 for a7 and =o =r . 
Ree 


212 Feodora Berz on Potentials 


Such a function V(z, 0) is continuous as well as its first derivative (see 
fig. 2a), but it is not analytic asa whole, on account of the discontinuity 
of higher derivatives at the points e=-++7, y=0. The potential in the 
plane, as derived from (2) or (3), is given by 


Fig. 2a. 


Potential variation along the OX axis. V(«)=— cosa for —7<a<a7; V(x)=1 for 
x>n7 and x<—7z. 


ig. 2 
Fig. 20. 


V 


Y 


0 1 2 


Potential variation along the parallels to the OX axis corresponding to z=2—e 
and w=a7-+te. ‘ 
V(z, y)=—cosxcoshy for —aw7<a<z, 


V(x, y)=1 for. %>7 sand 2<—7, 


Derived from Axial Values in Electron Optics 213 


Such a potential is discontinuous along the lines parallel to oy, for 
“%=—n7 and «=-+7,, since for «+0 
lim V(az—e, y)=cosh y, 
e—>0 


lim V(n+e, y)=1, 


e—>0 


and similarly at «=—r, (fig. 2b). 


§ 3. APPROXIMATE PractTIcAL SOLUTIONS. 


The uniqueness theorem appears to impose very drastic restrictions on 
the solution of potential problems. It states that potential distributions, 
even if they are continuous, cannot be arbitrarily prescribed on the axis 
of symmetry, if an analytic solution is required, and that no alteration 
of an analytic potential distribution along a finite portion of the axis 
can be made, without introducing discontinuities, and other very serious. 
singularities. 

However these theoretical limitations are of little consequence in 
practical cases for two reasons. The physicist wants to obtain a given 
potential distribution along a finite portion of the axis of symmetry, and 
not along the whole axis. Moreover, this potential distribution need be 
realized only appoximately. Physical quantities are never measured 
exactly, and rigorous coincidence is a meaningless concept in physics. 

By a theorem of Weierstrass (Goursat 1927), if a function is continuous 
over a finite interval of the real axis, it can be approximated over this 
interval with any accuracy bya polynomial +. The following propositions 
can be derived from this theorem. 


Proposition |. 

Any continuous potential distribution, prescribed along a finite section 
of the axis of symmetry, can be realized with any degree of approximation 
by an infinity of different potential distributions. 


Proposition 2. 
These potential distributions can be constructed without any singularity 
in the plane. 


Proposition 3. 

These potential distributions can be constructed with prescribed values 
at any number of points outside the axis of symmetry. 

To prove these propositions consider a real polynomial P(x), which 
approximates the prescribed potential function v(#)=V(x, 0) with the 
required degree of accuracy over the range a <a <6. 

If we take 

V(x, y)=ReP(z), 


+ The approximation of a function by a finite number of terms of its Taylor 
series is a particular case of Weierstrass’ theorem, since it applies only when 
the function is analytic, and the portion of the real axis over which the 
approximation is desired lies within the circle of convergence. 


; 


214 Feodora Berz on Potentials 


we obtain a harmonic function which is symmetrical with respect to OX, 
since P(x) is real. It has no singularity, and varies, within the accuracy 
desired, as the function v(7) over the interval a<a# <b. 

Since the polynomials of approximation P(z) are obviously infinite in 
number, the number of possible potential distributions is also finite. 

This proves propositions 1 and 2. 

Let us note that although these possible potential distributions are 
very similar along ab they become very different when | x?+-7/? | increases. 
Consider for instance two polynomials P(x) and P*(a), such that for 
A<“U<b 3 

| f(~)—P¥(x) |<e/2, “| f(x)—P(x) |</2, 
where e¢ is a given number, as small as desired. 

On the other hand, since ReP*(z)—ReP(z) is a polynomial in x and y, 
| ReP*(z)—ReP(z) | becomes infinite for x00 or y> oo. 

Moreover, the potential distributions can be made very different 
even in the neighbourhood of the axis of symmetry, as follows from 
proposition 3, whose proof is given below. 

We want to find a polynomial P(a) such that 


|.o(z) Play |<e” . cc eet OP es 
fora <7'=<b, ‘and 
ReP(z)=ReP(Z)=V,'G=1, 2...) ee 
at the points 2,=2,-+iy,, 2;=a:—iy, outside ab. 


The function V(x, y)=ReP(z) will then satisfy the conditions required. 
Consider the polynomial 


where 


ae. *)fAG) 
A e=(F Je Ate jer 


and p(z) is a polynomial to be determined later on. 
The polynomial P(z) obviously satisfies condition (6), since 


and 


[ A(z) ] =} A(z) ] ae 
(z—2,)A"(z;) Je=2; (z—2,)A’(z,) denay 


Consider now condition (5). Let us put 


Derived from Axial Values in Electron Optics 215 
The function f(x) is continuous along ab and is real for w real, since A(«) 


s 1 1 
is real, and ————___. and —_——___—_ are conjugate. 
(t—z ,)A'(z 5) (w—2 ;)A'(Z 5) te 


Therefore it follows from Weierstrass’ theorem that we can choose the 
real polynomial p(x) such as 


vw) [ 1 1 € 

em YG | Cee ee cae Sea 

A@) ,°L@—Ha]) > an. P@)| < Max [A@]’ 

where Max | A(x) | is the maximum value of | A(x) | along the interval ab. 
Multiplying both sides of the inequality above by |A(x)| we thus have 


caibar 1 l 
fee jA(2) | eae a9 cowed — p(x) A(x) 
1A 


panies, SF 
Max |A(z)| ~~” 


and the polynomial P(z) satisfies condition (5). 

It has to be noted that proposition 3 gives only a particular instance 
of various supplementary conditions that can be imposed on the potential 
function which approximates a given function along a finite portion of 
the axis of symmetry. Quite a wide range of different supplementary 
conditions could be imposed as well. 

In practice the polynomials approximating a given function are usually 
taken to be either polynomials obtained through interpolation (Whittaker 
and Robinson 1940) or sums of orthogonal polynomials (Michell and 
Belz 1937). Trigonometric polynomials, instead of algebraic polynomials, 
can also be used. 

Figs. 3a and 36 illustrate Proposition 2, concerning the elimination 
of singularities. 

The potential function v(x) to be obtained in the interval —2<#%<2, 
y=0, is given as follows 
v(%)= Tae Kuti ao 4 Oe katt te) eee eh a (7) 


Therefore the potential function which gives exactly (7) in the interval 
(—2, +2) is 
: 1 1+a2—y? 


Fig. 3a gives the equipotential lines corresponding to (8). The point 
x=0, y=1 is a singular point, and in its neighbourhood V(x, y) takes al] 
the values from — oo to + ©. 

Fig. 36 shows an approximate solution to the problem. We wanted 
dv dP 


to obtain a polynomial P(#) such that |v(w)—P(x)| and Ae he 


: s Sa eel 
are small. Therefore a polynomial of interpolation are of 9th degree 


216 Feodora Berz on Potentials 


Fig. 3a. 


Potential distribution which corresponds to V(x)= 


Fig. 30. 


Approximate solution. The potential along the OX axis approximates the 


function with an error smaller than 0-015. 


1 
l+<a? 


Derived from Axial Values in Electron Optics 20 


has been used}, which takes the same value as a at the points 


29—1 : 
Uj—= 12 oe »j=1,2,,...5. (As shown in Appendix I this 


choice of points leads to a good approximation by interpolation. 
polynomials.) 
We then obtain 


d 
dx 
and by integration 
P(x) =1—0-94742a?+ 0-63677a4— 0-246224a5+ 0-0479635a8— 0-00363642210. 
The error limits in the interval —2 <a <2, y=0 were found to be 


Ny 
= — 1-89485x +-2-547087x3 — 1-4773472> +0-383708a7 — 0:036364252%, 


dv dP 
| eas <0-05, | v(x) — P(x) |<0-015, 
where 
Max oY =0-65 Max | v(x) |[=1 
da ’ ; 


The approximate potential distribution V*(a, y) is given by 
V* (x, y)=ReP(z) 

y? d?2P y* d*P ap qdi9pP 

PO) 31 dat * aida T° — 10d 

The last-line is an application of formula (3). 

Fig. 3b shows the equipotential lines corresponding to V*(a, y). They 
are very similar to the lines of fig. 3a in the strip 0<y<0-10. They 
become very different for values of y larger than 0-30. Of course V*(a, y) 
has no singularities, and the branch point R is a point where the electric 
field is zero. 


§ 4. Case oF RoTaTIONAL SYMMETRY. 
The same general results are obtained in this case, if formule (1) (2) (3): 


are replaced by 
Cee OV MOV 0 


ae ' r Or! art” 


Vig 2)= [oe+ir cos ¢) dd, 
TJ0 


y2 dv yt d4v (— yen d2™% 


Vers )=00)— ga pe + page gat gear emp ae TO 


where r and z are cylindrical coordinates. 


=ex. Hence a 


a 
felt. (= — =| <e we have | v(x)—P(a)j= Ke: _ =) dz 
dv 
dix 
too large), whereas the converse is not true. 


good approximation of — leads to a good approximation of* v(x) (if x is not 


218 Feodora Berz on Potentials 


Two interesting examples concerning this case are given by N. Plass 
(1942), who considers the equipotential lines which produce the axial 
potentials v(z)=A exp (—z?/2) (Scherzer’s lens of minimum aberration), 
and v(z)=A tanh z. Plass uses formula (9) where the functions of z are 
also expanded in Taylor series, and retains only a finite number of terms. 
N. Plass considers his method to be valid for small values of 7 only, 
because he implicitly assumes that the equipotential lines must have 
approximately the shape of those derived from the rigorous solution. 
It is seen from the present paper that such an assumption is not necessary, 
and therefore the results obtained by N. Plass can be extended to large 
values of r. It has also to be noted that a better approximation is 
usually obtained when v(2) is expressed by an interpolation polynomial 
(see Appendix), or orthogonal polynomials, instead of using a finite Taylor 
series. 


ACKNOWLEDGMENTS. 

The author is indebted to Dr. D. Gabor of the Imperial College for 
suggesting this research, and wishes to express appreciation for his 
helpful interest in this work. 

Thanks are also due to the Directors of Mullard Radio Valve Co. for 
permission to publish this paper. 


AtPiPAO NED IEXs 


ABOUT THE CHOICE OF INTERPOLATION POINTS. 


Known and useful results based on an extremal property of Tchebycheff’s 
polynomials are expounded here, since the works dealing with them are 
not easily available. 

It is well known that if P,,_ ,() is the polynomial of degree »—1 which 


takes the same values as f(x) at points x; (j=1, 2,..., ”) in the interval 
ab, then in this interval (Whittaker and Robinson 1940), 
Le | aay | 
[JPM 55 | gon [Max An@ - - - + (20) 
where 
A,,(v)=(w=-2,)(S—a3) <5 leer et. CC CD) 


ane aye : 
and where it is assumed that is is continuous. 
; La 


To obtain the smallest upper limit for | f(x)—P,,_1(x)| as given by (10), 
the points of interpolation x,, x5, ..., x, must be such as to ensure the 


smallest value of Max |A,,(2)|. 
a<r<b 


Derived from Axial Values in Electron Optics 219 


The interval of interpolation ab may be assumed to correspond to 
—I<@<l, which can always be the case after a suitable change of 
variables. Then the smallest value of Max |A,(x)| is obtained if the 


—L<g<1 
points %, %, ..., %, are the zeros of the Tchebycheff polynomial of 
degree n, T,,(%), or more precisely if 
| A,,(#)=T,,(a). 


This theorem is due to Tchebycheff (1899), and a short proof of it will 
be given here. 
The Tchebycheff polynomial T,,(@) is given by 


cos (nm cos~t x) 


T,,(@)= RE OT 
= Oat Ot eo age (12) 
ere : F i een 
Its zeros are in the interval (—1, +1), and are given by %;=cos ee 
=1, 2,-...,%. Therefore _T, (x) can be-put in the form (11). 
On the other hand, for A,,(~) different from T,,(x), we have 
Mam A ee aX ey ena. ae Gs (19) 


—1<¢<1 —1l<@#<l 


This can be derived as follows. 
From (12) it is seen that 


Masts (e) (== 5n-1 


-and 


mar —])™ 
T,(cos"—™) —| ) TD Oe tent 16 


Let us now suppose that 
Max |A,,(x)|< Max |T,, (2). 


-—1<@#<l -—l<#<l 


Then the polynomial T,,(7)—A,(v) will be positive at the points 


mar : : ; 
Ly = COS — corresponding to m even, and negative at the points #,, 


4 The coefficients a; are determined by putting 


x=cos 6 
0 
T,(0)= SF = 5 [exp(ind)-+exp(—ind)] 


I 


2S 4 foe ae 


[(cos 6-4 sin 6)”-+ (cos —2 sin 8)”] 


[ety (a2—1r+(e— y(e?—1)"]. 


220 On Potentials Derived from Axial Values in Electron Optics 


corresponding to m odd, and hence T,,(7)—A,,(«) will change its sign n 
times. It must therefore have ” zeros, which is impossible since 
T,,(x)—A,,(%) is of n—I1th degree. 

For a similar reason it is not possible to have 

Mas |A,()|= Max |T,,(c). 
~1<#<1 —1<¢<1 

Hence (13) has been proved, and the upper limit of |f(~)—P,,_,(#)| as 
given by (10) is minimum when A, (7)=T,,(z). 

In this case 


n 


Max da” 


—1 
Te DT ois 


? 


Fl) Py sl < ae 


and it is seen that the upper limit is much lower than when P,,_ ,(x) is a 
finite Taylor series r,_ (x), in which case we have 


d"f 


1 
[f(@)—T-1(2)|< Max 
* —1l<¢<l 


REFERENCES. 


CossteTT, V. E., 1946, Introduction to Electron Optics, pp. 25, 128. Oxford : 

GovrsatT, E., 1927, Cours d’ Analyse Mathématique, 1, 498-500. Paris. 

MicuHELL, J. H., and Betz, M. M., 1937, The Elements of Mathematical Analysis, 
Vol. II., Chapter XVII. 

Puass, J. N., 1942, “ Electrostatic Electron Lenses with a Minimum of Spherical. 
Abberration ”’, Journal of Applied Physics, 18, 49-55. 

TCHEBYCHEFF, P. L., 1899, Oewvres, Vol. I. St. Petersburg. 

Wuittaker, E., and Rostnson, G., 1940, The Calculus of Observations.. 
Chapters I and II. 


pe $2218) 


XX. Properties of Slow Electrons in Polar Materials*. 


By H. Frouticu, H. PELZERt and 8. ZIENAU, 
Department of Theoretical Physics, The University, Liverpoolf. 


[Received November 25, 1949.] 


SUMMARY. 


Using a variational method we have investigated the properties of the 
lowest energy levels in a range hw above the ground level of the system 
consisting of an electron and a continuous dielectric medium. The latter 
is supposed to have a single vibrational frequency w/2z for long longitudinal 
polarization waves. The interaction between the electron and the medium 
then depends on three parameters, the static dielectric constant ¢, the 
optical refractive index <!,, and w. The replacement of a crystalline 
lattice by a continuum is a good approximation if the length b=(h/2mw)! | 
is large compared with the lattice distance, a condition which is usually 
fulfilled. 

We find that the energy of the ground level may be considerably 
{compared with hw) below the energy which the system would have in 
the absence of interaction. The electron can be found with equal 
probability at any point in the medium. The average polarization of 
the medium therefore vanishes at any point. This does not hold, 
however, for the average polarization at a given distance from the electron. 
This quantity varies at large distances as the polarization of a point charge, 
but shows deviations below distances of the order 6. The energy of 
interaction depends very little on the average velocity of the electron. 
Slow electrons, therefore, behave very similarly to free electrons (§4). 
It follows then that self trapping in the lattice—a suggestion which 
has often been discussed—does not exist (§5). 

Modified forms of previous formule of the mean free path of electrons 
are given in §6. It is shown, however, that the validity of the whole 
method used at present to calculate mean free paths requires further 


investigation. 


$1. INTRODUCTION. 


THRouGH combined theoretical and experimental work during the past 
twenty years it has been established that slow electrons in periodic 
crystal lattices behave in many respects like free electrons; the main 
influence of the periodic field is to replace the electronic mass by a slightly 
ee eS 

* Based on Report L/T 221 of the British Electrical and Allied Industries 
Research Association (E.R.A.). 

+ On. the research staff of the British Electrical and Allied Industries 
Research Association. 

{Communicated by the Authors. 


222 H. Frohlich, H. Pelzer and S$. Zienau on 


different effective mass. The effect of deviations from the periodicity,. 
e.g. by thermal motion or through lattice defects can then be described. 
as leading to scattering of otherwise free electrons. It might seem that 
ionic crystals differ in this respect from others. For in these crystals. 
the motion of ions has a resonance at frequencies which makes their 
displacement by moving charges strongly dependent on the velocity of 
the latter. In fact it has been suggested that in view of their polarization 
an electron might be, trapped near any point in the lattice; and that 
a certain energy would have to be supplied to transfer it into a state 
in which it can move more freely. Our calculations show that this is 
not the case. 

Electronic motion in ionic crystals is of importance for a considerable 
number of physical processes. In the present paper, therefore, we have 
made a systematic study of the properties of the lowest energy levels 
of an electron, assuming it to be free except for reaction forces due to 
the enforced displacements of ions. 

To obtain a qualitative idea of the interaction between an electron and 
the polarization produced by it let us first consider the behaviour of a 
moving point charge e [cf. Frohlich and Pelzer (1950)] and then discuss 
the modifications required if the point charge is replaced by an electron. 
The polarization P in an ionic crystal can be considered as super-position 
of two components [cf. e. g. Frohlich (1949)]: (i) the optical polarization Py, 
which is due to the displacement of bound electrons of the lattice with a 
resonance frequency in the optical or ultra-violet region ; (ii) the infra-red 
polarization P,;, whose resonance frequency w/27 lies in the infra-red 
and is connected with the displacement of ions. For the slow velocities 
v of the point charge in which we are interested the optical polarization 
is always excited to its static value whatever the velocity v. The 
interaction of Py with the point charge (or the electron) therefore is of 
no interest to us as it does not influence the energy difference between 
states of different velocity. The infra-red polarization behaves differently 
in this respect because of the lower resonance frequency of ionic oscillations. 
At large distances from the point charge (position ry) the polarization P,,, 
is of course the same as in the static case. It is thus obtained by sub- 
tracting the optical polarization P) from the total polarization P, and 
can be derived from an electric potential ®(r) by the equation 


4nP. (t)=aorad Ol) 7k cee aC Lan 
where according to electrostatics 
1 1 € 
O(e sete eee a Oa 
(r) (— =) oe Na Sra Mahe ce 


Here r is any point at sufficient distance from ro, € is the static dielectric 
constant and e,, is the dielectric constant at high frequencies where the 
ionic motion cannot follow. «2 is the optical refractive index. An ion is. 
displaced as in the static case only if its distance from the approaching 


Properties of Slow Electrons in Polar Materials 223: 


point charge is sufficiently larger than v/w. The reason is that it takes. 
a time of the order 1/w to displace an ion. Below this distance the point 
charge passes so quickly that its action on the ion can be considered as. 
a shock, exciting oscillation after it has passed. In this case the potential 
at the position of the point charge is obtained by replacing |r—r,\| in 
(1.2) by v/w ; multiplication by e leads to the energy of interaction. To. 
obtain the net interaction energy we must add the energy required to. 
polarize the lattice. For harmonically bound ions this reduces the 
interaction energy to about half. of the above value. Its order of 


magnitude is thus 
eA 1\ ew 
a (te ee et re eee oe (1 3)5 
2 (= *) ) (3) 


If the point charge is replaced by an electron the uncertainty relation 
requires that we cannot confine it to a small region in space without a 
considerable uncertainty in its velocity. Let this region be of linear 
dimension @ around a point ry. Thus at a large distance from r, the 
potential is still given by expression (1.2). At short distances there are 
now two competitive effects which prevent the potential from falling 
below a certain value. Considering first, the uncertainty in position 
only, the magnitude of the lowest value of the potential would be obtained 
by replacing |r—ry| in (1.2) by a. On the other hand, considering the 
effect of the velocity only, |r—r,| must be replaced by v/w. We should 
expect that it is the mechanism which leads to the larger critical distance- 
which decides the order of magnitude of the potential inside the region 
to which the electron is confined. The two distances are not independent, . 
however, because according to the uncertainty relation dv—h/ma and Av 
decides the order of magnitude of v if the point ry is either at rest, or 
moving only slowly. A minimum for the critical distance exists, therefore, . 
being the lower value of the two quantities v/w and h/mv. It is hence 


of the order 
jie Ns. 
p= (saa) ’ . . : : . . . . (1.4). 


a length which will be of importance later on*. It now follows that a 
minimum value for the negative interaction energy should exist whose - 


magnitude is of the order 
L fel 1\ e? 
tt (ee Le ee ena @ 9 
We 2 (= =) on (1.5) 


It is suggestive to imagine the electron to be trapped by the potential field 
due to the polarization induced by itself, as was first done by Landau (1933) . 
and later on followed up by others. The electron cloud is supposed 
to be centred around a fixed point in the medium ; the exact 
distribution depends on the interaction with the polarization of the medium 
which in turn depends on the distribution of charge density. By making 


* The exact numerical value has been chosen for later convenience. 


224 H. Frohlich, H. Pelzer and 8S. Zienau on 


polarization and density distribution self-consistent a solution is obtained 
which is supposed to have the following properties: (i.) it constitutes 
the lowest energy level of the system ; (ii.) its mobility is so low and its 
-effective mass so high that the electron can be considered as trapped ; 
(iii.) by supplying a minimum energy the electron can be liberated into 
-an unbound state, similar to the case that an electron is bound (trapped) 
by a permanent potential. These properties, if true, would be of great 
importance for the behaviour of electrons in ionic crystals. No attempts’ 
have been made so far to prove any of the three contentions. They are 
_all objectionable, and we shall find in fact that none are valid (§ 5). 


§ 2. CALCULATIONS ; GENERAL METHOD. 
Following the treatment in a previous paper [Fréhlich, (1937) quoted 
_as (I.)] we shall investigate properties of a system consisting of an electron 
and a continuous dielectric medium possessing a single infra-red vibrational 
frequency. The electron is treated as free apart from its interaction with 
this medium. Any effect of a periodic lattice field might be taken into 
account by the use of an effective electronic mass. In the previous 
“paper the scattering of a relatively fast (several eV.) electron was 
considered by treating its interaction with the medium as a small 
‘perturbation. At present we shall be mainly interested in a calculation 
of the position and properties of the ground level of the system, and of 
the subsequent levels in a range of the order of 1/10 eV. For this purpose 
we require the Hamiltonian H consisting essentially of three parts: 
(i) the energy of the dielectric, described by its polarization ; (ii) the 
-electron’s energy of interaction with the polarization ; (iii) the kinetic 
-energy of the electron. As indicated in the introduction the energy of 
interaction between the electron and the optical polarization is the same 
for all states in which we are interested at present. It thus leads to a 
constant term in the Hamiltonian which we shall omit. The state of 
the dielectric is then entirely described by the space and time dependence 
of the infra-red polarization P,,. It is convenient to develop P,, into a 
Fourier series of polarization waves using a term introduced on a previous 
-oceasion (I.). Thus we can write 
Ree rid (aw exp {twr}++o.0,), 20 a) 
where V denotes the volume in which we are interested and c.c. stands 
for the complex conjugate of the preceding expression. In the absence 
of external forces (e.g. due to the electron) the ay depend periodically 
on time. As in (I.) longitudinal waves only are of interest because 
transverse waves do not interact with an electron. In contrast to (I.) 
we shall be mainly interested in polarization waves which are long com- 
pared with the lattice distance. It is for this reason that the dielectric 
can be treated as a continuum. The frequency w/2z of all waves can then 
be considered as independent of wavelength. This frequency differs 
however from that of very short waves ; its value will be discussed in § 4. 


Properties of Slow Electrons in Polar Materials 225 


It is useful to introduce the electric potential ®(r) from which the 
polarization P,,. can be derived according to (1.1) because then the energy 
of interaction is simply given by 


Aiiy,4=eP(r,), . . ° . . . . (2.2) 


where r, is the position of the electron. Introducing two new sets of 
coordinates X, Y,, and a constant M by 


Agee | Owe) yi Maji==—Wag——as), |. (2.8) 
we obtain from (1.1) and (2.1) 
47 _ 1 : an 
PO(r)= Vi 7m = (x. sin wr-+- Wa °° wr) Mw Ss (26) 


Clearly without a field Y,—=MX,. The constant M was introduced 
so that formally the medium is represented by a set of harmonic 
oscillators with mass M, displacement X, and momentum Y,. We 
shall choose the constant M in such a way that the oscillator energy 


is given by 
= 
i ) Serene ee. (2-0) 


It should be clear, however, that neither has M the dimension of a mass 
nor X and Y those of length and momentum respectively. 

Denoting by p the momentum of the electron the total Hamiltonian 
is given by 


Hxte=2; : (210*x%+ 


HH. +e8(r,)+ =. SCRE RRS TERS 
The X,, and Y,, and r, and p can then be treated as sets of canonical 
conjugate coordinates. 

We might determine M from the known energy of a simple (é. g. 
homogeneous) displacement. We shall instead calculate the potential 
-due to a point charge at r, and determine M by comparison with the known 
result (1.2). This will give us an opportunity of demonstrating the 
method of handling (2.6) in a simple case. The replacement of the 
electron by a point charge implies that the kinetic energy term p?/2m must 
be removed from (2.6) and r, must be replaced by the constant ry. Then 
if we introduce the constants 


= 47e 1 sin wry = 47e 1 COS Wry 
eee a re Bes Pa Osh 627 
= Vi w Mo? ’ Vw Vt w ow a) 
clearly io 
H2 DH Saver? pe ho eligi a dal OPEN 
3 eta WOO HUES ht 
where 
diss ] = 
Hossa ok Dw)" 1 Cleves Wi BA eee LSS (2.9) 


2M 
SER. 7, VOL. 41, NO. 314.—MARCH 1950 S 


226 H. Frohlich, H. Pelzer and 8. Zienau on 


By this transformation our Hamiltonian has been separated into a 
constant term and a set of Hamiltonians of harmonic oscillators, with 
coordinates X,—Xy and momenta Yy— Vy. 

Quantum-mechanically, X and Y are the expectation values of X and 
Y ; classically they are the averages over one period. The expectation 
value for G(r), using equation (2.4) is thus given by 

167°e O Sees 
a ee COE ate ost Gea Sarah 

VM « w 
On this, and on all later occasions, the sum will be transformed into an 
integral in the usual way by counting the number of values which w can 
have in a small range w? dw dQ on the assumption of a periodic boundary 
condition. Thus 


O(r—ry)= 


< V Hees 5 
ef Peete dw dQ( ), Sliyctaepelbsal) Ms Ti rola 2 cae 
where dQ is the element of the solid angle in w-space. The integration 
goes over all w-space*. The result agrees with (1.2) if we determine 


M from 
1 “% 1 1 if 9.19 
Mow? =—— ae Ane Fae e . . . . - . . ( . ) 


Unfortunately when replacing the point charge by an electron this 
simple method can no longer be used and a suitable approximative method 
has to be found. The electron as well as the oscillators will be treated 
quantum-mechanically. 

Let us first express the Hamiltonian in a useful matrix representation. 
We shall make use of the complete orthogonal set of wave functions 
WK; 21, 2, ..-%y, ---) Which diagonalizes H when the interaction 
term e@(r,) in (2.6) is neglected. These functions’ have previously been 
discussed in (J.). They can be written as 
PK ee ee. = SPE Ty ( Xu tha), + (2.13) 
where yy(%,) is the wave function of a* harmonic oscillator with 
coordinate X, in its m,th excited state. In this representation, the 
diagonal elements of the interaction, H,,,, vanish. Thus the diagonal 
elements of H are given by 

Hi K nies ces thw2d'(n_+s), . . . (2.14) 
m “ ; 


corresponding to the energy when interaction is neglected. The non- 
diagonal elements vanish except for those corresponding to the absorption 
or emission of one quantum hw by the electron. If wis the wave number 


* In (L.) it was necessary to introduce an upper limit wy) of w. This is 
unnecessary at present as the main contributions to all integrals required. will 
be found to arise from much smaller values of w. 


Properties of Slow Electrons in Polar Materials 227 


of this quantum then they will be denoted by (e®)xx-w to show that only 
the interaction energy gives a contribution. Their value is found in 
a straight forward way similar as in (I., p. 236) by making use of (2.4) 
and of the well known matrix elements of the coordinate and the 


momentum of a harmonic oscillator. Introducing the value (2.12) 
for M we find 


(141%)! 
(€®)uxw=(e®),, | A <= Se ae eee 6000 59) 
—nt, 
where 
4mie 1 Thw / 1 ele 


Here (1-+-n,,)' holds in the case of an emission of a quantum fiw with 
wave number w by the electron, provided 


K’=K—w. 2 hte, re Loe game Perce ar 6938 
For absorption 3, must be used instead, and 
Kee Ker wets ae 12,18) 


All non-diagonal elements for which neither of these conditions is fulfilled 
vanish. 

We shall derive our solutions from those of ordinary second order 
perturbation theory, which starts in zero order with an electron with 
energy smaller than fiw and with all »,—0. Such an electron cannot 
emit a real quantum. It can, however, emit, and reabsorb virtually 
single quanta provided (2.17) and (2.18) are fulfilled respectively. The 
opposite sequence, first absorption and then emission is impossible as 
initially all », vanish. After previous emission, just one n,=1 is 
present however. Apart from the sign, both matrix elements are then 
equal, putting in (2.15) n,=0 and n,=1 respectively. They are then 
given by +(e®),, equation (2.16). The energies in zero order in the 
initial state H,(K) and in the intermediate state H)(K, w) respectively 
are given by [making use of (2.14) and (2.17)] 

ee le & h?(K—w)? 

B(K)==-> Hol K, w)= a 

if we omit the zero point energy of the oscillators, Xhw/2. This is 

admissible as we are only concerned with energy differences. Thus, 
according to the well known formula, the perturbed energy is given by 


| (eP)w 
w £,(K, w)—£,(k) 


a aa Pak (2219) 


hW?K? 2m | (€®)ww [? 
So is ee TT Ce 2.20 
2m h* fy eon a eee) 
where we have introduced a constant wu by (cf. 1.4) 
LEN eas LUT ‘A 
= =o). Fe ge Set (ena b 
ey Pan Ries ha ( ) 


$2 


228 H. Frohlich, H. Pelzer and 8. Zienau on 


The corresponding wave function becomes 


¥(K)=¥,(K ;0,0,...)+2c,?,(K—w;l,), . - (2.22) 
where 
ge Oe a ce 


~ £y(K, w)— #4) i” (K—w)?+u#—K? 


It consists of the sum of wave functions of our set having the same 
momentum #K with appropriate factors. The first term corresponds 
to the case in which no quantum is excited, the remaining functions 
describe states with one excited quantum each. 

We shall generalize the above expressions by continuing to use a wave 
function of the type (2.22) because without specializing the c,, this is the 
most general wave function with total momentum #K containing functions 
with none or one excited quantum. It is to be hoped that ‘this will 
suffice to describe the lowest energy levels, except in the cases of very 
strong interaction. The coefficients c,, and the energy H(K) will be 
determined by minimizing the expectation value of the energy, with 
respect to c,. Thus 


B(k)= | ven] | we, sf 


The integration goes over all variables in Y. Making use of the matrix 
elements of H discussed above, we find after multiplication by 


a 


| Y*P—142|c,|?, (the latter holds in view of the orthogonality and 


normalization of the functions Y) 
(142 |e |?)E(K)=By(K)+2| Cy Ho(K, w) +E (CyB) t0.c.). (2-25) 


Hence by varying with respect to c, and cX we obtain using (2.19) and 
(2.21) 


ai = ICP recta (€P)w ooh 
ww BK, w)—2(K) h?[ (K—w)?+u?]/2m—E(K) se) 
and 
eee [(e®)wP 
ME) BE) aK, w) EE) 
ES Sin |e eb 
~ Sm — WF S (Kw) —2m BRE ee 


Both expressions (2.27) and (2.26) turn into the corresponding ones 
(2.22) and (2.20) in the perturbation theory if the energy perturbation 
is sufficiently small. We also notice that the normalization integral 


J= | P*PRH14 2 legl? AE Se Role gy 


Properties of Slow Electrons in Polar Materials 229 


must differ very little from unity if perturbation theory holds. This 
18 not required in the case of the variational method. 

Finally we should like to mention that equation (2.27) for the lowest 
energy level with momentum /K can also be obtained by diagonalizing 


the matrix representation of H with respect to states containing either 
none or one quantum. 


§3. ENERGY, VELOCITY AND OTHER PROPERTIES. 


The above expressions for the energy and the wave functions contain 
K as a parameter. The quantity AK represents the momentum of the 
total system, electron+polarization waves, as can easily be verified. 
In fact each of the functions Y, appearing in (2.22) has the same momentum. 
nK. 

To obtain the energy in an explicit form it is useful to introduce the 
following dimensionless quantities [cf. (1.5), (1.4) and (2.21)] 


epi Bugle (Leh \6 7 _ £(K)—E(K) 
ape «=F =3(——2) Fe ee ev prea a 
The energy can then be written as 
= hes (2 i Ve ee (3.2) 


if use is made of equation (2.19). 


We then find from (2.27) by introducing from (2.16) and using (2.11), 
(2.21), (1.5) and (3.1), 


apc a dish Sec RE Be eee 
I~ Fn? J Lawrie 
Jets in-1 wv if i = 2 
caper ihe aes fra 3.3) 
a WT . 2 
=0 if 11 --y<0: 


We shall restrict ourselves mainly to the case 1+-y>a? which as we shall 
see corresponds to the lowest range of energies. In this case the 
denominator in the above integral does not vanish for any real w. In 
the other cases the values given above are principal values. In a more 
convenient form we thus have 


Tye Peel gestern iets (8.4) 


In particular, if y) denotes the value of y for <=0, 


(Wk eee Pen at res. (9-5) 


230 H. Frohlich, H. Pelzer and 8. Zienau on 


From this expression we see that « acts as an interaction parameter. 
If «<1, then ya, a result which is identical with that obtained by 
perturbation theory. 

Near «=0, the quantity y increases quadratically with x. By develop- 
ment one obtains 


Of ah : 
i ~~ —_— Be 12 ° ° . . . 3.6 
I—Yo= 34. 3 sO a (3.6) 


The upper limit of this range is reached when sin (ya/«) in (3.4) becomes 
equal to unity, 7. e. at values x, and y, satisfying 


% 7 
ahy ae UyY=45- Rr te 
1 a 


Hence 
2_y =] ab es (3.8) 
“i—y,=1, 24(%7—1) OS i ose) ORE 


showing that 2,>1, and with (3.2), that the energy at this upper limit 
is equal to hw. Also developing near x, y, 


(y—a)— (yy —a2) = (a9—1)? SF (7 x)? if (u—mx,)*<ma,. (3.9) 


Turning now to the average velocity of the electron we shall prove that 


__ 1 0K(K) 
5 (3.10) 
where v, represents the X-component. By definition, 
h oF 
laste * * ‘ 
: =| ¥ =e | | Ye yo 5 Sex tao 


or a | from (2 2 2) using the orthogonality of the functions Wp, 


sta (KetZ(K,—w,) emt) /(1-+2]eul) 


=X, S0.eak (42H) rae ty iB 
Now i i 
w,=w(cos 8 cos d—sin @ sin # cos ¢), 
if # is the angle between w and K, # the angle between K and the x-axis, 
Kee i CO8 TD, a ck hed Seale ee eee is 


and ¢ is an azimuth around the K-axis. Since cy is independent of 


this azimuth ¢ we can average in (3.12) over 4, thus tranforming w,, into 
w cos @cos #. With the use of (3.13) we then find 


m 


2 coe 
% ve —K,= Ske ee |? cos 0/(1+2|egl). vac TOL ay 


Properties of Slow Electrons in Polar Materials 231 


‘On the other hand, from (2.27) we find by using (2.26), 


m oO m 0H(K) mK, 0 @),, |? 
BoE (KBR) a) eps Be Oe (CP )wP 
RR aK, | oe) i? OK, ee O Kat Hy( Ki WwW) ELK) 
LLG n/a | m 0B(K) 
= FE Zen? (K—w cos 0 Fe?) (3.15) 


Hence the expression (m/h?)(0H/0K,,)—K,, becomes equal to the right 
hand side of (3.14) ; equation (3.10) then follows immediately. 

We shall now calculate the value of the normalization integral J (2.28) 
which is required to judge the accuracy of the perturbation method. 
By inserting from (2.26) with the use of (2.16), (2.11), (2.21) and (3.1) 
we can write 


dw dQ 
ee tees ote wo S| 
elo ta (Kow kept: (318) 
A straight forward integration leads to 
| JENS an eee (3. 
* Ero y— ey sea 


It will be noticed that at the lowest level x=0, y=yy, the value of J 
becomes, using (3.5), 


a 2+ 3Yo 

ae eae Sa eo (3.18) 
Thus near «=0, the perturbation method converges if the interaction 
parameter « is sufficiently small compared with unity. As we approach 
the upper limit of our energy range, however, J tends towards infinity 
in view of (3.8). The perturbation method thus becomes increasingly 
- unsatisfactory when the variational method should still be reliable. 
At the upper limit of our range x=x,, however, J tends to infinity which 
means that here the variational method also breaks down. 

A further quantity of interest is the expectation value of the 
polarization potential (2.4) at a certain distance from the electron. We 
have considered this for the lowest level, K=0, only. For this purpose 
we first require the quantity 


o(r,r,)= | voy / | VY, eee sic) 


where the integrals extend over all the coordinates except those of the 
electron. Thus with (2.4) 
ato) 


4n 1 : 
RAD LAs . (3.20 
io (wie. sin wr -+ ST Pw wr) ‘ (3.20) 


P(r, r,)= 


where we define 


Zalt)—= | Pe | LEH Prepay ca TS 1a) 


232 H. Frohlich H .Pelzer and 8S. Zienau on 


and similarly for Y. Use of (2.22), (2.13) (replacing there r by r,) and. 
(2.28) leads to 


(RON 
(142 |eml)X ul) =(Cm exp {—iwe,}-+ehexp (owe) (sar) (822) 


Here the square root is the ordinary oscillator matrix element for X, and 
the value (2.12) of M must be introduced. A similar expression is obtained 
for Y,(r,). By inserting cy, from (2.26) the summation of (3.20) can be 
carried out with the help of (2.11). Making further use of (3.5) and (3.18), 
the result is : 


P(r, r.)= = (— ae ) (l—exp {—|r—r,] u(1+Y)*}) 


fanee |r—r, | 


2 
2+3Yo 
i Sey 


We notice that the energy of interaction is given by 


a ] IN 2(1+-Y)# ae (1+)? 
eP(r,, r,)= — (— = *) é me Bye == Q+3y__ W, - (3.24) 


which for small values of y, becomes equal to —2W. A fraction of this. 
energy has to be spent to polarize the lattice. 


RESULTS AND DISCUSSION. 


§4. PROPERTIES OF THE Lowest ENERGY LEVELS. 


The results which we have obtained in the previous calculations. 
concern the lowest energy levels of the system consisting of one electron 
and of a continuous dielectric medium possessing a proper frequency w/27. 
Apart from the electronic charge e and mass m (which might be replaced 
by an effective mass to account for the periodic lattice field) they contain 
three parameters all of which can be found experimentally. These are 
the static dielectric constant ¢, with optical refractive index «%, and the 
frequency w/27 of long longitudinal polarization waves. The latter can 
be expressed by the frequency w ,/27 of long transverse waves which is the 
experimentally known frequency of infra-red residual waves. One finds 
(Frohlich 1949, §18) 


w=w,(e/e,)*- spite 2 oP) pout se ed Sa 


Long wave in this connection means long compared with the lattice 
distance. The calculations have shown that the length 6b introduced 
for dimensional reasons in (1.4) is the characteristic length occurring in 
our results. The fact that its ratio to the lattice distance 


mane es 
7 =a (smc) >} see Cone eae AES 


is large compared with unity for any reasonable value of w justifies our 
procedure to treat the material as continuous. 


Properties of Slow Electrons in Polar Materials. 233: 


The method which we have adopted contained a development of the 
polarization of the dielectric in normal coordinates. In the absence of 
electrons the behaviour of a dielectric material is then described in terms 
of quantized polarization waves. All quanta have the same energy hw, 
but their momenta may differ, depending on the wavelength. The 
interaction with an electron can then be described in terms of absorption 
and emission of quanta iw by the electron ; this is in fact the procedure 
previously used to calculate the scattering of electrons (cf. I.). At present 
we are however, mainly concerned with the energy levels of our system. 
Their broadening by scattering processes would lead us at once into. 
difficulties. We have escaped this, in the present paper, by restricting 
ourselves to the lowest energy levels in a range iw above the ground 
level ; in this case a free quantum can never exist. It should be realized 
that energy levels always refer to the energy of the whole system. To 
assume that not a single free quantum is excited in an infinite dielectric 
can only mean that its temperature is zero. We shall find below, however,. 
that actually the only restriction required is to assume that 


LTE eae aN cn ohare ins. (LOO) 


In the mathematical technique which we have adopted we chose solutions 
which did not allow the existence of free quanta ; for technical reasons. 
they break down at an energy somewhat higher than hw above the ground 
level. We shall present them in this whole range, but it should always 
be kept in mind that they are reliable only if the energy difference 42. 
from the ground level is smaller than hw, 


Deyo eer eee eae Se (4-3), 


Lowest Levels —According to (3.5), using (3.1) and (2.19) the lowest 
energy level #(0) is obtained from the equation 


—E(0)[f1—E(0)/fwb=W, ee BY 
4. €. 

E Ole Wario ta, 2 ee tes 5 (4.5): 
and 

E(0\o—(lofWt if Who. . . . . (4.6) 


The energy W depends on the three parameters ¢, «,. and w according to 
equations (1.5) and (1.4). For practically interesting cases usually 
a=W/hw is somewhat larger than unity (of. §5.) Also it should be 
remembered that Wow, i.e. « © 1/w?. 

The lowest level is the lower limit of a continuum; the energy difference 
from the lowest level, JH(K), depends on the parameter K when K is. 
the momentum of our system. This should be distinguished from the 
electronic contribution mv(v—electronic velocity) because the dielectric 
carries a part of the momentum. It follows from (3.1) and (2.19) that 

hi? 


AE(K)=E(K)—E(0)= = K?+holyy—y), + «+ (47) 


234 H. Frohlich, H. Pelzer and 8. Zienau on 


where y satisfies equation (3.4). In the neighbourhood of K=0, in 
particular, using (3.6) and (3.1), 


h? 1 Y% ai 
Qe VEN ES = ‘ fe ANAS OTR (4-58 
tee ea! (1 3 4) Gott 
The second term in the bracket takes account of the influence of the 
interaction between electron and dielectric. Since the quantity 


Yo=—E(0)/hw is positive the value of the bracket is always between 
1 and 8/9. Thus, though the shift of an energy level due to interaction 
may be considerable (compared with fiw) it depends only very little 
on the momentum. 

It should be mentioned here that ordinary second order perturbation ~ 
gives the value (4.5) for the lowest level, and replaces the above bracket 
by (1—y,/6). If this method were applied when the condition y)<1 
does not hold it would lead to the contradictory result that the state 
K=0 is not the lowest. This demonstrates the supremacy of the 
variational method. 

Turning to larger values of K, the deviations of 4H(K) from h?K?/2m 
become increasingly larger. In fact the curvature of the curve representing 
AE(K) against K changes sign and ends with horizontal tangent at a 
value K,=ux, (cf. 3.1) of K which constitutes the limit beyond which 
the mathematical formalism breaks down. At this value, according to 
(3.8) and (3.1), 


EK ahaa Se Se FOr ee ee ea 
Since H(0)<0 it follows that always 
AE(K,)>ho. i) 2 Rae Se a 


Hence (cf. 4.3) this upper region is outside the range of validity of our 
calculations. 

Velocity.—Making use of (3.10) and (4.8), the average electronic velocity 
near the ground level is given by 


nhK PSY, 
v= (1; sit). ° . : . : (4.11) 


For higher values it reaches a maximum and then falls to zero. Again 
these latter parts are outside the range of validity of our calculations. 

Acceleration. An external field F accelerates our system. To discuss 
this question we imagine a wave packet to be formed so that we can 
approximately fix the position of the electron. Shifting it a distance 4a 
means a transfer of energy ef Ax from the field to the electron. Hence, 
the value of K,, changes by an amount 4K, and the energy-treating 4K, 
as small—by 4k,dH(K)/dK,. Dividing by the time dé required, we 
have with v,— 42/At and with (3.10), 


AKe ef 
rads pul! (air tue aera CR 


Properties of Slow Electrons in Polar Materials 235 


an equation well known from the theory of metals. Using again (3.10) 
we find 


4y, 10H AK, eF @H 

At % OK de ~ 7 OK 
In the region in which (4.8) and (4.11) hold the acceleration is thus 
similar to that of a free electron. For larger values of K it decreases, reaches 
zero when the velocity has its maximum and then reverses its sign. The 
latter region again is outside the range of validity of our calculations. 


(4.13) 


Fig. 1. 


I 2 


‘Curve a represents the energy of electrons in units of hw (7. e. the quantity «*—y) 
as a function of the reduced momentum x; the dotted curve refers to 
free electrons. Curve 6 is the same curve shifted so as to demonstrate 
that the interaction hardly depends on. the momentum. Validity 
outside the dotted region is doubtful. 


The main results of our calculations are shown in the accompanying 
figures. The quantities iw and fu, respectively, have been used as units 
of energy amd momentum. The results according to §3 then depend 
on the interaction parameter only. The value «=4:47 has been used 
in the drawings (for NaCl it would be 5-2). Fig. 1 represents the energy . 
as function of the momentum For comparison a free electron curve 


236 H. Frohlich, H. Pelzer and 8S. Zienau on 


has been drawn showing a shift considerably larger than hw (2. é. unity 
in the present units). A displaced free electron curve starting at the 
same ground level demonstrates that the shift due to interaction 1s 
nearly independent of the momentum. The validity of the energy curve 
is restricted to values in a range fiw above the ground level. 

Fig. 2 shows the average electronic velocity as a function of momentum, 
compared with the case of free electrons. Within the range of validity 
the deviations are seen to be small. 


Fig. 2. 


I 2 


The full line is the average electronic velocity as a function of momentum ; 


the straight line corresponds to free electrons. Validity outside dotted. 
region is doubtful. 


In fig. 3 we show the density of energy levels per unit energy D() as 
a function of energy, and the total number of levels up to a given energy, 


| D@) dE. For comparison the corresponding curves in the case of free 


electrons are also drawn. The quantity D(Z) can easily be calculated 
from relation 


_,dK | 
DUH) Sa eee rt earn (4-14) 


Properties of Slow Electrons in Polar Materials 237 


§5. THE QUESTION OF SELF-TRAPPING. 


As mentioned at the end of §1 it has first been suggested by Landau 
(1933) that an electron might “ dig its own hole” and thus be trapped 
in the medium. The electron would thus be described by a probability 
distribution whose value is appreciable only in the neighbourhood of a 
given point in the medium. It will then give rise to a certain polarization 
which if frozen in,. would act as a centre of attraction for the electron 
and lead to at least one bound state. Probability distribution 
and polarization can be made self consistent. The result is a particular 
solution of (2.6). 


Fig. 3. 


l 2 3 


‘Curve a represents the density of energy levels as a function of energy ; b shows 
the total number of levels. The dotted curves refer to free electrons. 
Validity on the right of the vertical line is doubtful. 


One of the objections to be raised against the above method is the 
preference given to a particular point in the medium. No doubt, in the 
exact solutions for stationary states the electron can be found with equal 
probability at any point. Our solution (2.22), (2.26) satisfies this require- 
ment. At the same time the polarization relative to the position r, 
of the electron (cf. fig. 4) shows a very similar behaviour as it does in the 
Landau method relative to a fixed point. The polarization potential 
@(r,r,) behaves for large distances like 1/|r—r,| down to distances 
of the order b (provided « is not very big). For smaller distances 
@(r, r,) varies only slowly as compared with 1/|r—r,]. 


238 H. Froéhlich, H. Pelzer and 8. Zienau on 


A second objection arises when the linear extension of the probability 
distribution is larger than our characteristic distance 6, equation (1.4). 
In this case it is no longer possible to assume the polarization to be rigid. 
This objection mainly holds for small values of the interaction parameter 
« (cf. (3.1)], or if « is large, for excited states whose orbit is larger than b. 

Comparison of our results for the lowest energy level H(0) [equation (4.4)] 
with those obtained by Pekar (1946) (using the Landau method) confirms 
the above conjecture in the case that the quantity c=(l/e,,—1/e)~*, and 
hence (cf. 3.1) «, is small compared with unity. For in this case Pekar’s. 


Fig. 4. 


The full curve represents the average potential (3.23) as a function of a reduced 
distance €=|r—r,|u(1+y,)! from the electron; the dotted curve 
refers to the case of a point charge. 


value for | #(0)| is proportional to c? and therefore of a smaller order than 
ours which [ef. (4.5) and (1.5)] is proportional to the first power of c. Our 
value for |£(0)| on the other hand, being based on a variational method 
is certainly not too large. : : 

For strong interaction (large «), both methods lead to similar values. 
for (0). Thus Markham and Seitz (1948) using methods developed 
by Mott and Littleton (1938) and by Pekar (1946) find the value of 
—0-13 eV. as compared with —0-09 in our calculations *. 

* Obtained from equation (4.4) with the help of (1.5) and (1.4) using the 
following numerical values: e=5-6, €,—2:3, w=4:8 10! sec. The latter 


lef, 5 & i iy 1 . 
Bi ni corresponds to a residual wavelength (27¢g/w)(€/eo)} =6 1p, Co=Vvelocity 


Oe 


Properties of Slow Electrons in Polar Materials 239. 


The case of the Landau method for a calculation of the lowest energy 
level thus can be expected to lead to reasonable values only if the 
interaction parameter « is large. This method as described so far does 
not yet lead to self-trapping. Two further assumptions would have to 
be made which have never been proved. They are both objectionable 
and our calculations prove in fact that they are wrong. They are 


(i.) that the whole configuration cannot move freely through the lattice, 

(ii.) that there would be a continuum above the (presumed) discrete 
ground level similar to the case of an electron moving in a permanent 
tield of force. 


This latter assumption would lead to an infinitely extended probability 
distribution for which, as we have shown above, the Landau method 
cannot be applied. 

Assumption (i.) is disproved by our calculations which show that the 
energy difference of states with non-vanishing momentum #K from the 
ground state (K=0) hardly differs from that for free electrons although 
the shift of energy levels due to the interaction may be large. That 


assumption (i.) cannot be correct can also be seen more directly from the 


fact that the polarization due to a moving point charge differs from that 
of a static one only at distances below v/w. For an extended charge 
distribution, representing the electron in the Landau method, no 
appreciable dependence of the polarization on the velocity can be 
expected so long as v/w is smaller than the main extension of the charge. 
In this case, then, the interaction energy (including the energy required 
to polarize the medium) is nearly independent of velocity. 

It should be added that our method appears to be quantitatively 
correct for values of « upto a1. For larger values («5 for NaCl) we 
hope that the variational method still ensures a correct order of magnitude. 
Improved calculations are in progress to investigate this. 


§6. Mean FReE Partu. 


In calculating the conductivity the average contribution per electron 
is usually written in the form e?7/m where 7 is a quantity with the dimension 
of a time, the average relaxation time. The quantity e7/m is often 
denoted as mobility. If a model is used in which electrons are free to 
move except for occasional elastic collisions, then a relaxation time 7(v), 
depending on the velocity v can be introduced in such a way that its 


average over all electrons, r(v) is equal to 7, 


TO) == 7. St Soe ee ere at (6.1) 
Similarly a mean free path /(v) and an average mean free path i can be 


introduced by 


I(v)=7(v)v, Paue ee ae ee on Oey 


240 H. Frohlich, H. Pelzer and 8. Zienau on 


The relaxation time 7(v) is not simply the average time between two 
collisions but contains also the scattering angle ; it might be called the 
average time between two large angle scatterings. 

in our case of electrons in an ionic crystal the assumption that ‘the 
interaction between electrons and polarization waves can be treated as 
a small perturbation leads to a straight forward calculation of the 
conductivity, and hence of the relaxation time although the collisions 
are not elastic and are connected with absorption and emission of 
quanta fw. It turns out, however, that only for electronic energies 
large compared with iw (treated in I.) or smaller than hw (Fréhlich and 
Mott 1939) is this treatment reliable. Energies which are only slightly 
larger than hw lead to relaxation times which are smaller than 1/w or to 
mean free paths shorter than the de Broglie wavelengths as we shall see 
presently. This means that the method is then no longer valid. 

To treat this case of energies of several hw the calculation in (I.) must 
-be slightly modified. For in (I.) we were mainly concerned with fast 
electrons scattered primarily by very short polarization waves. Electrons 
with energies of only several fw, however, absorb or emit only very long 
polarization waves as follows from conservation of energy and momentum 
This means that the assumptions made at the beginning of this paper 
(especially §2) hold. Instead of carrying through the calculations in 
detail it is sufficient to compare with (I.) the expressions for the matrix 
elements and from there derive the replacements to be made in the final 
expression. The reason is that the Hamiltonian used in (I.) differs from 
ours only in the value of constant parameters. Comparing now the 
expression for M,,,, in (I. page 236) with our matrix element (e®)gx-y, 
[equations (2.15) and (2.16)] we see that we have to replace 


an 1 
ayy, 8 rahe 
Gn), Weak. (oes 


‘of (I.) by 


Teint . 
V, Die, 1s (=-=)] aS Gna 


respectively. Thus equations (15) and (16 a) of (I.) must be replaced by 


Lo el 2 
eed Od | Od «Fora. =< Hewr Th 
z =f Y exp oT | oc Tat Lneaue.: Some | Gaay 
and 
il 1 1\ ez A 
tT) (= az *) Rat if E'>m Nam =,severalfiw. . . (6.5) 


The energy HL’ (denoted by 2%7?h?/(8ma?) in I.) is of the order of 1 eV. 
In thermal equilibrium the condition attached to (6.5) holds for an 
appreciable number of electrons only if k’ hw. It then follows that 


1 1 Deen 
==(<-2)4e — if kP>hw. 2... (66) 


Ew 


Properties of Slow Electrons in Polar Materials 241 


Hence using (6.2) and v= BkT/m we obtain 


= 1) Ge. : ’ 

fem re 200 a hore li er (6s) 
where 

dy= 3 0-5 x 10-8 om. 9 > & age tiaibes CR) 


is the Bohr radius. 

The remarkably simple result leads to values of 1 of the order of 
5x 10-8 cm. This is appreciably smaller than the de Broglie wavelength 
of thermal electrons at room temperature. The quantitative validity 
of (6.7) is, therefore, doubtful. 

Electrons with energies below iw can only absorb but not emit quanta 
hw. The perturbation calculation for this case has been carried through 
by Fréhlich and Mott (1939) modifying the method developed in (I.) so 
as to hold for very slow electrons. In the light of the present paper a 
further modification of the constant is required now, by making the 
replacement (6.3). This should be made in the second formula on 
page 501 of Fréhlich and Mott (1939) leading to replace 

Tee cares b ees oye l 1 

pieremenae 9s. aaa eos) 
considering equation (6.8). Use has been made of the fact that k, of the 
previous paper has now been denoted by uw, cf. equation (2.21), and that 


hkim=v. Applying then equation (9) of Frohlich and Mott and (2.21) we 
obtain for the mean free path 


(6.9) 


€ 


€€ mv \t 
l=rTv=7,v(exp {hw/kT }—1) = 3g - ar a) (exp {hw/kT}—1). (6.10) 
The average mean free path thus becomes 
3586 €€ 5 Gr 3 


a ee Ton) (&8P fhen/kP}—1) ieee <i... .(G.11) 


It is larger than the one obtained previously [equation (11)] by a factor 


€Ey 
e—e,+1- 
As a consequence of this correction there is a considerable improvement 
in the agreement between theoretical and experimental values of / of 
cuprous oxide. The previous formula gave values eight times too small ;. 
the factor (6.12) reduces this to about two (e~10). It is thus required 
to assume the effective mass to be only slightly smaller than the electron 
mass to obtain agreement. 
The above formule for mean free paths have been derived on the 
assumption that perturbation theory is applicable. We found in 
§§ 2 and 3 that at the absolute zero of temperature this holds if the- 
interaction parameter « given by equation (3.1) is small, «<1. This. 


SER. 7, VOL. 41, NO. 314.—MARCH 1950 T 


(6.12) 


242 On Properties of Slow Electrons in Polar Materials 


result can easily be extended to higher temperatures as long as kT’ <hw. 
For on this condition relatively few free quanta are excited ; they influence 
only little the calculations of the energy of interaction so long as we deal 
with electrons whose kinetic energy, in zero approximation, is sufficiently 
far below hw. 

Thus although equation (6.11) seems well founded, if «<1, a serious 
objection can be raised. Absorption of a quantum fw leads the electron 
into a level from which after a very short interval it would re-emit a 
quantum /#iw—in general with different momentum. This suggests to 
treat such collisions as second order transitions, leading to the absorption 
of a quantum fw, followed by a competitive emission of another quantum 
hw whose wave number w can have any value in a whole range. The 
formal method to be adopted would be very similar to that used by 
Breit and Wigner (1936) in nuclear physics *. 

It should also be clear that by allowing the temperature 7’ to be 
different from zero the energy of our whole system is always considerably 
higher than fw above the ground level. We have just seen that for 
low temperatures this would not seriously affect our calculations as long 
as perturbation theory is valid. This holds also for larger values of the 
interaction parameter « where the variational method differs considerably 
from the perturbation method, but only as far as the energy of interaction 
is concerned. Collisions would require an entirely new treatment. The 
former contention follows at once by considering in the variational 
calculations, especially in the sum in (2.27), only those wave numbers for 
which free oscillations are not excited thermally and then treating the 
latter by perturbation theory. This is permissible because their relative 
number is small so long as k7’<hw; the subsequent change in the 
expression for the energy [¢f. (3.3)] is therefore small too. In the case of 
collision, however, the method cannot be applied because it is then no 
longer possible to separate the treatment of excited and nonexcited 
oscillators. 


The authors are indebted to the British Electrical and Allied Industries 
Research Association for their support. 


REFERENCES. 


Brett and WiceNeER, 1936, Phys. Rev., 49, 519. 

FrouticH, H., 1937, Proc. Roy. Soc. A., 160, 230; 1949, Theory of Dielectrics, 
Clarendon Press, Oxford. 

Frouticn H., and Mort, N. F., 1939, Proc. Roc. Soc. A, 171, 496. 

Froutuicu, H., and Peuzer, 1950, To be published; also E.R.A. Report, L/T 184. 

LANDAU, 1933, Z. Phys. Sowjet., 3, 664. 

Marxkuam and Serrz, 1948, Phys. Rev., 74, 1014. 

Mort, N. F., and Lirrteton, 1938, Trans. Faraday Soc., 34, 485. 

PEKaR, 1946, Journ. Physics U.S.S.R., 10, 341. 


* Added in proof.—One of us (S.Z.) has now carried cut this calculation. It 
leads to a further factor } in equation (6.11). 


[243eN 


XXI. Kinetics of the Phase Transition in Superconductors. 


By A. B. Prpparp, 
Royal Society Mond Laboratory, Cambridge *. 


[Received December 15, 1949] 


ABSTRACT. 


A study is made of the influence of electromagnetic effects on the speed 
‘at which a transition can occur between the normal and superconducting 
states of a metal in a magnetic field, and it is concluded that these are 
powerful enough to be the dominant factor determining the speed. 
Illustrative examples include the transition of a superconducting plane 
‘slab and cylinder in a field greater than critical, the destruction of super- 
‘conductivity in a wire by means of a current, and the mechanism whereby 
the intermediate state is established. 


L. IytTRopDvucTIoON. 


‘THE phase-transition of a superconductor into the normal state through 
the influence of a magnetic field has been extensively studied from the 
‘point of view of determining the conditions under which the two phases 
‘may exist side by side in equilibrium. Typical of the experimental work 
on this problem are determinations of critical magnetic fields; studies of 
the destruction of superconductivity in thin wires by passage of a current ; 
and investigations of the intermediate state in bodies of non-zero de- 
‘magnetizing coefficient. At the same time fundamental thermodynamical 
‘studies of the various problems have served to correlate the results of 
these experiments with other properties of the superconductor, such as 
the specific heat. Very much more attention has been directed to this, 
the statical aspect of the problem, than to the kinetic aspect, and scant 
-consideration has been given to the mechanisms which are involved during 
the progress of the transition. Examples of the types of phenomena which 
‘demand kinetic treatment are such problems as the rate at which the 
transition from the superconducting to the normal state occurs in a field 
-greater than the critical field, or from the normal to the superconducting 
- state in a field less than critical; the closely allied problem of the rate of 
propagation of a phase boundary along a wire; and the problem of the 
‘mechanism whereby the intermediate state is established. Of the 
experimental work on this aspect of the phase transition we may cite as 
examples the investigation by Lazarev, Galkin and Khotkevich (1947) of 
the behaviour of wires carrying high-frequency alternating currents of 
such amplitude as to induce a phase transition, and the recent work of 


** Communicated by the Author. 


re, 


244 A. B. Pippard on the 


Faber (1949) on the propagation of phase boundaries along wires. There 
appears to have been no discussion given of the electromagnetic processes 
which will have to be considered in any attempt to account for the observed 
phenomena, and the present paper represents an attempt to provide a 
starting-point for more rigorous treatments than can be given at the 
moment. It is unfortunate that the two experiments quoted are concerned 
with transitions which will be difficult to treat rigorously, and it is there- 
fore not considered expedient to give any discussion of them here. On the 
other hand, the physical principles of the electromagnetic processes 
underlying these complicated examples may be understood clearly from 
idealized simple models, and the way may therefore be paved towards at 
least a qualitative understanding of the problems of more practical 
interest. 


II. Tor Puast TRANSITION IN A PLANE SEMI-INFINITE SLAB. 


Consider a semi-infinite slab of a superconducting material having a 
plane surface normal to the X-axis, as in fig. 1. Ifa magnetic field H,, 


Fig. 1. 


® H, 


re 


) Xo 
normal @H, | 
Superconducting | 


greater than the critical field H,, be suddenly applied parallel to the surface, 
we may imagine that a thin layer at the surface makes a transition into 
the normal state, so that there is created a plane phase boundary parallel 
to the surface. The boundary will move into the interior of the metal 
as the normal region expands, and we wish to find its rate of propagation. 
It is clear that so long as the external field exceeds H, there can never be 
a stable position of the boundary, since if it comes to rest the field at the 
boundary will eventually rise to a value H,. On the other hand, the rate 
of propagation cannot be too great, since as the normal region expands 
magnetic flux must enter it, and in doing so induce currents whose effect 
is to reduce the field strength at the boundary. It seems reasonable to: 
suppose, then, that the boundary moves at such a rate that the field 
strength next to it is maintained through the action of the induced currents: 
at a value H,. 


Kinetics of the Phase Transition in Superconductors 245 


The fields in the normal metal must satisfy the equations (in which 
the units are electromagnetic) : 


oH oE @H 
De = 4nJ, Ox = On 9 and J=cK. 
Hence 
07H oy oH 
Onin nels 


If the field at the surface, H,, be written as H,(1+p), and the field H 
at any point in the normal metal be written as H,(1-+f), then f satisfies 
the same differential equation, 


Ae Hilt ied elo ai er er 1) 
which must be solved subject to the following boundary conditions :— 
(a) f=p when «=0, 


(6) f=0 when x=2p, x) being the depth of the phase boundary below 
the surface of time f, 


oe dx, 
(c) oe Or Whee v— 72, 
‘Condition (c) follows from the fact that the electric field in the normal 


metal next to the boundary is equal to H, ‘2 A solution * of equation (1) 


may be found in the form : 
f=fy), where — y=2/%. 
Substituting in equation (1), we find that 


d 
f" + 2noy = (x9) -f'=0. SE eek ae aa Sten a (2) 


In order that f shall be a solution it is necessary that the coefficient of f” 
shall be time-independent. We may therefore write 


7 0) = oo : where « is a constant ; 
t.€. 
appt : ; 
pea oe aeeOrw elite Owe tore. ye (2) 


seen Le Aes ll Sa aA Bel i ee el aaa ae ES 

* After this solution had been derived, it was pointed out to me by Dr. P. M. 
Marcus that the problem is analogous to that of the freezing of a lake, which is 
discussed by Carslaw and Jaeger (1947). I feel that there are sufficient 
differences in the treatments to justify setting out the following method of 
solution in full. 


246 A. B. Pippard on the 
The constant « may be determined from equation (2) and the boundary” 
conditions. We have that 
Sf’ +apyf'=0. 
Hence, on integrating and applying condition (c), 
f'=—ap exp {ap(1—y’)}- 
Integrating once more and applying conditions (a) and (b) we find that «. 


must satisfy the equation 


1 
| exp {tap(1—y?)} dy=1. 
0 


Fig. 2. 


0-9 


08 


O-S -O 5 20 


This equation is readily solved numerically, and in fig. 2 the variation of 
« with p is exhibited. It will be seen that for small values of p, « may be 
taken as unity without great error, and equation (3), for the variation 
with time of the depth of the boundary below the surface, reads : 


aries rs eee seme Cet Es 


This approximate result might have been obtained very simply by 
supposing the field strength in the normal metal to be everywhere He 


Kinetics of the Phase Transition in Superconductors 221 


For then a uniform current of density oH, <8 will be induced by the 
movement of the boundary, so that in order to maintain a difference in 
field strength pH, between the boundary and the surface it is necessary 
to put ; 


d. a 
470 Hy 2 = Ole: 


which leads immediately to equation (4). It will be found valuable to 
keep this approximate method in mind when we come to consider problems 
to which the above method of exact solution does not apply. The 
assumption that the field strength in the normal metal is everywhere H,. 
may be regarded as the first stage in the solution of the problem by 
successive approximation. We have seen that this assumption leads to. 
a uniform current density in the normal metal, and we may now use this. 
result to derive a closer approximation to the field distribution. The next 
stage in the calculation will thus start from the assumption that the field 
varies linearly from H, at the phase boundary to H,(1+p) at the surface. 
A closer approximation to the current distribution may then be reached 
immediately, leading to the result of equation (3) with « put equal to 
3/(3+-p). This expression is shown as a dotted line in fig. 2, from which 
it is clear that the second approximation is close to the exact solution 
over a wide range of p. 

It will be noted that the depth of the boundary, given by equation (3), 
varies as the square root of the time, as is to be expected from equation (1),. 
which has the same form as the diffusion equation. Equations (3) and 
(4) bear a resemblance to the result obtained in the theory of the skin 
effect, that the depth of penetration of a field alternating with angular 
frequency w is given by the expression : 


6°=1/27rwe. 


Thus, if we use the approximate solution (4), we see that for an external 
field 2H, (¢.e. p=1), the depth of the phase boundary after a time ¢ is. 
the same as the skin depth in the normal metal for a frequency 1/t. For 
good conductors the rate of progress of the boundary is consequently 
very slow. For example, a specimen of pure tin may have in the normal 
state a conductivity of 5 x 108 ohm~1cm.~}, or in e.m.u., o=0°5. If the 
external field exceeds the critical field by 10 per cent, p=0-1, and equation 
(3) becomes 


7 =0-03t. 


A time of about 30 seconds is therefore needed for the boundary to: 
penetrate 1 cm., and nearly one hour to penetrate 10 cm. 

Tn this treatment we have considered the transition to be isothermal, 
although, as is well known, the adiabatic transition from the super- 
conducting to the normal phase results in a drop in temperature. It is. 


248 A. B. Pippard on the 


therefore necessary to consider whether it is justifiable to assume that if 
the outside surface of the material is maintained at constant temperature 
throughout, by contact with liquid helium, there is sufficiently good 
thermal conductivity to maintain isothermal conditions at all points. 
On account of the formal similarity between equation (1) and the equation 
of heat flow the analysis may be carried out by dimensional means. In 
order that the transition shall be isothermal, all that is necessary is that 
the skin depth for a heat wave at a given frequency shall be much greater 
than the corresponding skin depth for an electromagnetic field in the 
4roK\* 
C 

tivity and C the specific heat per unit volume. For a good specimen of 
tin at 3° K the quantity on the left-hand side takes a value of about 500, 
so that the assumption of isothermal conditions is well satisfied. Since 
however 4/ox is proportional to the conductivity, it might be necessary to 
consider thermal effects in specimens of poor conductivity. 


normal material. That is, ( > 1, where « is the thermal conduc- 


III. Tur PHASE TRANSITION IN A CYLINDER. 


Consider a long superconducting cylinder of radius a suddenly subjected 
to a longitudinal field H,(=H,(1+:)). Ideally we may suppose that a 
normal sheath will be formed at the surface, and the cylindrical super- 
conducting core will shrink until finally it vanishes at the centre. The 
equation governing the rate of shrinkage will be the same as equation (1) 
but now on account of the cylindrical symmetry the method of solution 
adopted for the plane slab becomes inapplicable. We shall therefore adopt 
the approximate method mentioned in the last section to solve the problem, 
assuming the field strength in the normal material to take everywhere the 
value H,. 

Let the radius of the superconducting core be ry; then the magnetic 
flux contained in the normal material between the core and a cylinder of 
radius r : 

—n(?—r3)H,. 
As the core contracts, the induced electric field at a radius r will be given by 
ds OG ae H dry 


E(r)=—=— += =H,-—. 
(") 2rr dt oe SLE oi 
The circulating current density at radius 7 is thus given by 
%, af, 
J(r)=cH ft alt . 
ye Y di 


Now if the external field is H,(1+-p) we must have that 


4dr | JSdr= — pH... 
Hence ¥ 


a dr 

a be 0 

ro log — . — = —p/4ro. 
Pom 


Kinetics of the Phase Transition in Superconductors 249 


Integrating, and putting 7, =a when t=0, we obtain the result : 


L—p?(1—2 log p) = M ; where p=1o/a and ty=700"/p. 
0 

When p=0, t=t, so that ft) represents the total time required for 
‘destruction of superconductivity in the cylinder. Comparing this result 
with equation (4), we see that it is just one-half of the time needed for the 
penetration of a plane boundary to a depth 7). The manner in which the 
‘superconducting core shrinks is illustrated in fig. 3, where p is plotted 
against t/fy. The initial stage of the shrinkage approximates closely to 
the behaviour of the plane boundary, but after the radius of the core has 


Bigs 3: 
1O 
0-5 
Pp 
‘@) tft, 0:5 oO 


fallen below one-half of the specimen radius the rate of shrinkage increases 
one more, until in the final stage the last traces of superconductivity 
vanish rapidly. 

The reverse process, the transition from the normal to the super- 
conducting phase in a field less than critical, is complicated by the occur- 
rence of “ supercooling ”’, the metastable persistence of the normal phase 
in a field slightly less than critical. It is no part of this discussion to 
-enquire into the mechanism responsible for supercooling, which can only 
be understood in terms of a detailed theory of the superconducting state, 
_and of the surface energies at the boundaries between the two phases. 


250 A. B. Pippard on the 


If, however, we suppose that eventually the supercooled phase breaks. 
down by the formation of a small superconducting nucleus, its subsequent 
growth is probably largely determined by electromagnetic forces, and may 
be dealt with by the methods already described. The shape of the curve 
of fig. 3 suggests that the initial stage of growth may be extremely rapid, 
but we must remember that the figure refers to a long cylinder rather than 
to a small nucleus of approximately spherical shape. The nucleus will 
suffer from demagnetizing effects which will tend to increase the field 
strength at the magnetic equator, and decrease it at the poles. Thus the 
tendency will be for the most rapid expansion to occur along the magnetic 
lines of force, so that a long superconducting spindle will be formed, which. 
will then expand, if it lies at the centre of the cylinder, after the manner 
of fig. 3 (the direction of the time axis being of course reversed). 


IV. Tur EsTaBLISHMENT OF THE INTERMEDIATE STATE. 


So far we have considered only examples of the phase transition in a 
specimen of zero demagnetizing coefficient, for which ideally the complete- 
transition should be possible at the field strength H,. In bodies of more 
complicated shape, such as ellipsoids, the transition in an increasing field 
can occur reversibly only if it is spread out over a range of field strengths, 
and in this range, as has been shown directly by Meshkovsky and Shalnikov 
(1947), the material is split into alternate regions of superconducting and 
normal material, a condition known as the Intermediate State. A theory” 
of the equilibrium configuration of the intermediate state has been given 
by Landau (1943), and in this section we shall attempt a qualitative 
explanation of the mechanism whereby it is established. Let us consider 
a plane parallel slab of normal material in a magnetic field whose direction 
is perpendicular to the surface of the slab. As the field strength is reduced | 
below H, there will be an initial stage during which nothing occurs, the 
material remaining in the supercooled normal state. Eventually, however, 
we may expect a nucleus of superconducting material to be formed, which. 
will, in accordance with the ideas of the previous section, expand rapidly 
along the field lines, and not so rapidly in a radial direction, until at last 
a spindle will extend right aross the specimen. This spindle should continue 
to expand radially until it takes up such a shape that over the whole 
interface between the normal and superconducting regions (though not at 
the boundary of the specimen) the field strength is just equal to H,. 
This represents a stable configuration of the spindle. Now, during the 
whole of this process, the magnetic field has been expelled from the normal 
region, so that the field strength is never reduced below the value which it 
had before the nucleus was formed. There seems to be no reason therefore 
why any more spindles should be formed in the material until a chance 
fluctation results in the creation of another nucleus. Nevertheless, it is. 
observed in all studies of the intermediate state that once the supercooled 
phase has broken down there is a sudden transition into an intermediate- 


Kinetics of the Phase Transition in Superconductors 251. 


state filling the whole of the specimen. It appears as though the produc-- 
tion of one nucleus leads to a chain of nucleation processes, enabling 
superconducting regions to be formed through the entire specimen. ‘ 
. It is possible that the explanation of this effect is to be found in the 
imperfections of the specimen, such as grain boundaries or local strains. 
which lead to an irregular jerky growth of a superconducting spindle. If as. 
the spindle grows along the magnetic lines it meets an impediment, so: 
that its progress is momentarily retarded, demagnetizing effects will lead 
to a decrease of the field strength at its forward end. Just in front of the 
impeded spindle there will then be a region which is particularly favourable: 
for the production of a new nucleus. Once a new nucleus has formed. 
further growth should proceed roughly as indicated in fig. 4, in which the 
arrows attached to the spindles represent the directions favourable to. 
further growth. If any magnetic lines are trapped between the spindles, 


Fig. 4. 


as will always happen unless the second nucleus is created exactly in front 
of the first spindle, there will arise electromagnetic forces which will. 
effectively prevent coalescence of the two spindles. An adequate treat- 
ment of this problem, even by approximate methods, is difficult, but a 
qualitative argument should make the general principles clear. Consider 
the two spindles, S, shown in cross-section in fig. 5; for coalescence to- 
occur it is necessary that the magnetic flux in the intervening region shall 
be ejected. Now the circuit shown dotted in the diagram may be regarded 
as an inductance, whose magnitude is determined by the area of the 
flux-containing region, N, of normal metal included in the circuit. As the 
spindles expand, L will vary approximately as the separation of the 
spindles. The resistance of the circuit will be determined by the length 
of the path in normal material between the spindles, and thus will also be 
roughly proportional to their separation. As the spindles expand, the 
time constant L/R for ejection of flux will therefore stay roughly constant, 
and the rate at which they approach one another will diminish continuously. 


252 A. B. Pippard on the 


At the same time there is no such retardation of the growth in those 
directions in which there are no neighbouring spindles, so that coalescence 
is unlikely to occur before the spindles have expanded to such a size that 
the field strength in the normal environs has everywhere attained a value 
close to H,, and further growth is impossible. We may therefore picture 
the establishment of the intermediate state as a spreading chain of 
nucleations, arising from the jerky growth of the initial and subsequent 
nuclei, until in the end the whole specimen is filled with superconducting 
spindles, and the intervening normal regions are permeated by a magnetic 
field of strength near H,. 


‘Fig. 5. 


V. DESTRUCTION OF SUPERCONDUCTIVITY BY A CURRENT. 


As a last problem in the kinetics of phase boundary movement we 
shall consider the destruction of superconductivity in a thin cylindrical 
wire carrying a current. According to Silsbee’s (1918) hypothesis, when 
the current reaches the value }aH,, a being the radius of the wire, the 
field at the surface attains the critical value, and at this point resistance 
appears. Silsbee’s hypothesis has been found by Scott (1948) to give a 
precise description of the critical current for destruction of superconduc- 
tivity, even in very thin wires where the effects of surface energy between 
the phases might be expected to play some part. The behaviour of 
wires carrying currents greater than critical has been studied theoretically 
by F. London (1937), who has shown that over a certain range of currents 
an intermediate state must be set up. This is observed in practice, 
the resistance being only partially restored at the critical current, and then 
rising slowly to the normal value with increase of the current. There is 
not complete agreement between the experimental behaviour and London’s 
prediction, and this is attributed by Scott to the effects of surface energy. 
This aspect of the problem will not concern us here, since we shall consider 
only the kinetic problem of the transition from the superconducting to the 


intermediate state, without any discussion of the exact nature of the 
intermediate state itself. : 


Kinetics of the Phase Transition in Superconductors 253° 


The problem differs from that of the cylinder in a longitudinal field, 
discussed in Section III, in that here we have a longitudinal current and 
a circulating magnetic field. We shall see that this profoundly modifies. 
the results of the calculation. Let us at first neglect the possibility that 
during the movement of the phase boundary there is a return to the 
superconducting state of those regions of normal material which find 
themselves in a field less than H,; that is, we shall suppose that at any” 
instant during the transition the wire consists of an inner cylindrical core 
of superconducting material, of radius 7), surrounded by a sheath of normal 
material. If the current is such as to produce a field at the surface of’ 
the wire of strength H,(1+-p), then the total current must be 


T,~=30H,(1+p), 
of which the part carried by the superconducting core 
I=L, 
in order to maintain a field strength H, at the phase boundary. The- 
remaining current will be carried in the normal metal, 
Meas Hal op) aig) binds cus i sutest() 


We shall now use the approximate method of solution which assumes. 
the field strength in the normal metal to have everywhere the value H,. 
It immediately follows that the electric field induced by a shrinkage of 
dry 
‘dt’ 
normal metal. There is thus a uniform current density in the normal 


the core may be written in the form E=—H, at all points in the 


metal, J,——oH, at and the total current carried by the normal metal.,. 


dry 

dt © 

Comparing this result with equation (5) we have an expression for the 
rate of shrinkage, 


ies —rnoH,(a?—12) 


ar, al--p)—"o 


dt  2do(a?—r?) 
Integrating this equation, we find an expression for the time taken for 


the superconducting core to shrink to a radius 7», 


P > 
1+p—p 


t/27oa?=$-+- p—p(1t+p+4p)+p(2+p) log 
where 
p=",/a. B Bea cee: Whalek VE teas (i) 
In particular, the time taken for the superconducting core to disappear’ 
entirely is given by . 


t,/2roa? = $+ p+ p(2+>p) log P 


1-+p- 


254 A. B. Pippard on the 


‘The detailed form of this function is of little validity, as we shall see when 

we come to discuss the justification for the approximate method of 
_solution, but its interest lies in the fact that if we put p equal to zero, we 
find a finite value, 370a?, for tj. Thus the calculation predicts that, in 
contrast to the behaviour of a cylinder in a longitudinal field, as soon as 
the current reaches the critical value, the transition may be completed in 
_a finite time. For wires of the size used in experiments on the destruction 
.of superconductivity by a current this time is very short. If the radius 
.of the wire is } mm., and the material pure tin, the transition should be 
-completed in 12 milliseconds. 

So far it has been assumed that the transition consists solely of the 
shrinkage of the superconducting core until it finally disappears, but this 
-cannot be the complete solution to the problem, since London has shown 
that stability is only achieved by the formation of an intermediate state. 
In order to understand how the intermediate state is set up, we must look 
into the magnetic field distribution in the normal metal. The field 
strength has been taken as H, everywhere, but this assumption is not 
justified. In fact, if we now use our result that the current density is 
uniform in the normal metal to calculate the magnetic field, we find that 
when p=0 the field varies with radius according to the law 

aryg+r 


Eger ory : F : . : 5 ° (7) 


-conditions, but is less than H, at all intermediate radii. The minimum 
value of H occurs at a radius \/a7o, and at this point 


min= H, “i y 
a+r, 
As the core contracts, then, there are left behind regions of normal metal 
where the field strength is less than H,, and when 7) <a, there are regions 
near the centre of the wire where the field is very low. Presumably it is 
here that nucleation may occur for the establishment of the intermediate 
state, but no attempt has been made to carry the argument further, as it 
is difficult to conjecture the configuration of normal and superconducting 
regions which is stable in a wire carrying a current. 
The manner in which the magnetic field is decreased below H, in the 
normal metal suggests that the currents induced by the shrinkage of the 
core may have been overestimated, and hence the rate of shrinkage 
underestimated. A second approximation may be calculated by starting 
from the assumption that the field in the normal metal is given by equation 
(7), and it is found that in the final stage of collapse of the core the rate 
is indeed greater than was found in the first approximation. But the 
calculation is laborious, and the final result is complicated and mathe- 
matically not very well-behaved, so that is seems hardly worth while 
.attempting any further stages of approximation. 


H 


Kinetics of the Phase Transition in Superconductors 255 


The physical reason for the difference in behaviour of a cylinder when 
‘carrying a current and when subjected to a longitudinal field may be seen 
by considering the stability of the system under critical conditions. In 
the latter case, when the field strength is H,, any diameter of super- 
-conducting material is stable, and the system may be said to be in neutral 
equilibrium. On the other hand, if we pass a critical current through the 
‘cylinder, and then suppose a normal sheath to be formed, eventually all 
the current will redistribute itself on the superconducting core of smaller 
‘diameter, producing there a field greater than H,. A wire carrying the 
critical current is thus in unstable equilibrium, and as soon as any portion 
-of the surface becomes normal there is a tendency for the surface current 
-density in that region to increase and precipitate the transition into the 
normal state. This is probably the reason why Silsbee’s hypothesis 
-applies very well even in extremely thin wires, of the order of diameter of 
those which show effects of surface energy in transverse magnetic fields. 
‘The surface energy is unlikely to be great enough to counteract theinstability 
of a wire carrying the critical current, though, of course, it will play a part 
in determining the configuration of normal and superconducting regions in 
the intermediate state, and hence the way in which the resistance 
approaches the normal value once the critical current has been exceeded. 
‘Such an hypothesis would be consistent with the observations of Scott. 


I should like to thank Mr. T. E. Faber and Dr. B. Serin for interesting 
discussions on some of the points raised in this paper. 


REFERENCES. 


Carstaw, H. S. and Jancmr, J. C., 1947, Conduction of Heat in Solids, p. 71, 
(Clarendon Press, Oxford). 

Faber, T. E., 1949, Nature, 164, 277. 

Lanpav, L., 1943, Jour. of Physics (USSR), 7, 99. 

LazaRgv, B. G., Gatxin, A. A., and Kuorxevicn, V. I., 1947, C.R. Acad. Scv., 


USSR., 55, 805. Poke 
Lonpon, F., 1937, Une conception nouvelle de la supraconductibilité (Hermann 
et. Cie.) * 


Mesuxovsky, A. and Sxauntkov, A., 1947, Jour. of Exp. and Theor. Physics 
(USSR), 17, 851. ; 

Scort, R. B., 1948, Bull. Nat. Bur. of Standards, 41, 581. 

Smsses, F. B., 1918, Bull. Nat. Bur. of Standards, 14, 301. 


[ 256 ] 


XXII. Programming a Computer for Playing Chess *. 


By CiaubE E. SHANNON, 
Bell Telephone Laboratories, Inc., Murray Hill, N.J. + 


[Received November 8, 1949.] 


1. INTRODUCTION. 


THIs paper is concerned with the problem of constructing a computing 
routine or “‘ program” for a modern general purpose computer which 
will enable it to play chess. Although perhaps of no practical importance, 
the question is of theoretical interest, and it is hoped that a satisfactory 
solution of this problem will act as a wedge in attacking other problems 
of a similar nature and of greater significance. Some possibilities in this 
direction are :— 


(1) Machines for designing filters, equalizers, etc. 

(2) Machines for designing relay and switching circuits. 

(3) Machines which will handle routing of telephone calls based on 
the individual circumstances rather than by fixed patterns. 

(4) Machines for performing symbolic (non-numerical) mathematical 
operations. 

(5) Machines capable of translating from one language to another. 

(6) Machines for making strategic decisions in simplified military 
operations. 

(7) Machines capable of orchestrating a melody. 

(8) Machines capable of logical deduction. 


It is believed that all of these and many other devices of a similar 
nature are possible developments in the immediate future. The techniques 
developed for modern electronic and relay type computers make them 
not only theoretical possibilities, but in several cases worthy of serious 
consideration from the economic point of view. 

Machines of this general type are an extension over the ordinary use 
of numerical computers in several ways. First, the entities dealt with are 
not primarily numbers, but rather chess positions, circuits, mathematical 
expressions, words, etc. Second, the proper procedure involves general 
principles, something of the nature of judgment, and considerable trial 
and error, rather than a strict, unalterable computing process. Finally, 
the solutions of these problems are not merely right or wrong but have 
a continuous range of “‘ quality” from the best down to the worst. We 
might be satisfied with a machine that designed good filters even though 
they were not always the best possible. 


* First presented at the National IRE Convention, March 9, 1949, New 
York, U.S.A. 


+ Communicated by the Author. 


On Programming a Computer for Playing Chess 257 


The chess machine is an ideal one to start with, since : (1) the problem 
is sharply defined both in allowed operations (the moves) and in the 
ultimate goal (checkmate); (2) it is neither so simple as to be trivial 
nor too difficult for satisfactory solution; (3) chess is generally con- 
sidered to require “ thinking ” for skilful play ; a solution of this problem 
will force us either to admit the possibility of mechanized thinking or 
to further restrict our concept of “ thinking”; (4) the discrete structure 
of chess fits well into the digital nature of modern computers. 

There is already a considerable literature on the subject of chess- 
playing machines. During the late 18th and early 19th centuries, the 
Maelzel Chess Automaton, a device invented by von Kempelen, was 
exhibited widely as a chess-playing machine. A number of papers appeared 
at the time, including an analytical essay by Edgar Allan Poe (entitled 
Maelzel’s Chess Player) purporting to explain its operation. Most of 
these writers concluded, quite correctly, that the Automaton was: 
operated by a concealed human chess-master ; the arguments leading to. 
this conclusion, however, were frequently fallacious. Poe assumes, for 
example, that it is as easy to design a machine which will invariably 
win as one which wins occasionally, and argues that since the Automaton 
was not invincible it was therefore operated by a human, a clear non. 
sequitur. For a complete account of the history and method of operation 
of the Automaton, the reader is referred to a series of articles by Harkness 
and Battell in Chess Review, 1947. 

A more honest attempt to design a chess-playing machine was made 
in 1914 by Torrés y Quévedo, who constructed a device which played 
an end game of king and rook against king (Vigneron, 1914). The machine 
played the side with king and rook and would force checkmate in a few 
moves however its human opponent played. Since an explicit set of 
rules can be given for making satisfactory moves in such an end game, 
the problem is relatively simple, but the idea was quite advanced for 
that period. 

The thesis we will develop is that modern general purpose computers 
can be used to play a tolerably good game of chess by the use of a suitable 
computing routine or “ program’”’. While the approach given here is. 
believed fundamentally sound, it will be evident that much further 
experimental and theoretical work remains to be done. 


/ 


2. GENERAL CONSIDERATIONS. 


A chess “ position ” may be defined to include the following data :— 


(1) A statement of the positions of all pieces on the board. 
(2) A statement of which side, White or Black, has the move. 


(3) A statement as to whether the kings and rooks have moved. This 
is important since by moving a rook, for example, the right to castle on. 
that side is forfeited. 


SER. 7, VOL. 41. NO. 314.—MARCH 1950 U 


258 Claude EK. Shannon on 


(4) A statement of, say, the last move. This will determine whether 
a possible en passant capture is legal, since this privilege is forfeited after 
one move. 

(5) A statement of the number of moves made since the last pawn 
move or capture. This is important because of the 50 move drawing rule. 
For simplicity, we will ignore the rule of draw after three repetitions of 
a position. 

- In chess there is no chance element apart from the original choice 
of which player has the first move. This is in contrast with card games, 
backgammon, etc. Furthermore, in chess each of the two opponents has 
“« perfect information ” at each move as to all previous moves (in contrast 
with Kriegspiel, for example). These two facts imply (von Neumann and 
Morgenstern, 1944) that any given position of the chess pieces must be 
either :— 

(1) A won position for White. That is, White can force a win, however 
Black defends. 

(2) A draw position. White can force at least a draw, however Black 
plays, and likewise Black can force at least a draw, however White plays. 
If both sides play correctly the game will end in a draw. 

(3) A won position for Black. Black can force a win, however White 
plays. 

This is, for practical purposes, of the nature of an existence theorem. 
No practical method is known for determining to which of the three 
categories a general position belongs. If there were chess would lose 
most of its interest as a game. One could determine whether the initial 
position is won, drawn, or lost for White and the outcome of a game 
between opponents knowing the method would be fully determined at 
the choice of the first move. Supposing the initial position a draw (as 
suggested by empirical evidence from master games*) every game would 
end in a draw. 

It is interesting that a slight change in the rules of chess gives a game 
for which it is provable that White has at least a draw in the initial 
position. Suppose the rules the same as those of chess except that a 
player is not forced to move a piece at his turn to play, but may, if he 
chooses, ‘‘ pass’. Then we can prove as a theorem that White can at 
least draw by proper play. For in the initial position either he has a 
winning move or not. If so, let him make this move. If not, let him 
pass. Black is now faced with essentially the same position that White 
had before, because of the mirror symmetry of the initial position +. 
Since White had no winning move before, Black has none now. Hence, 
Biack at best can draw. Therefore, in either case White can at least draw. 


* The world championship match between Capablanca and Alekhine ended 
with the score Alekhine 6, Capablanca 3, drawn 25, 

+ The fact that the number of moves remaining before a draw is called 
by the 50-move rule has decreased does not affect this argument. 


Programming a Computer for Playing Chess 259 


In some games there is a simple evaluation function f(P) which can 
be applied to a position P and whose value determines to which category 
(won, lost, etc.) the position P belongs. In the game of Nim (Hardy and 
Wright, 1938), for example, this can be determined by writing the number 

oof matches in each pile in binary notation. These numbers are arranged 
in a column (as though to add them). If the number of ones in each column 
is even, the position is lost for the player about to move, otherwise won. 

If such an evaluation function f(P) can be found for a game it is easy to 
design a machine capable of perfect play. It would never lose or draw 
@ won position and never lose a drawn position and if the opponent ever 
made a mistake the machine would capitalize on it. This could be 
done as follows: Suppose 


f(2)=1 for a won position, 
f(P)=0 for a drawn position, 
f(P)=—1 for a lost position. 


At the machine’s turn to move it calculates f(P) for the various positions 
obtained from the present position by each possible move that can be 
made. It chooses that move (or one of the set) giving the maximum 
value to f. In the case of Nim where such a function f(P) is known, a 
machine has actually been constructed which plays a perfect game *. 
With chess it is possible, in principle, to play a perfect game or construct 
a machine to do so as follows: One considers in a given position all 
possible moves, then all moves for the opponent, etc., to the end of the 
game (in each variation). The end must occur, by the rules of the game, 
after a finite number of moves} (remembering the 50 move drawing rule). 
Each of these variations ends in win, loss or draw. By working backward 
from the end one can determine whether there is a forced win, the 
position is a draw or is lost. It is easy to show, however, that even with 
the high computing speeds available in electronic calculators this com- 
putation is impractical. In typical chess positions there will be of the 
order of 30 legal moves. The number holds fairly constant until the game 
is nearly finished as shown in Fig. 1. This graph was constructed from 
data given by De Groot, who averaged the number of legal moves in a 
large number of master games (De Groot, 1946, a). Thus a move for 
White and then one for Black gives about 10° possibilities. A typical 
‘game lasts about 40 moves to resignation of one party. This is con- 
-servative for our calculation since the machine should calculate out to 
checkmate, not resignation. However, even at this figure there will be 


* Condon, Tawney and Derr, U.S. Patent 2,215,544. The “ Nimotron ” 
based on this patent was built and exhibited by Westinghouse at the 1938 
New York World’s Fair. 

+ The longest possible chess game is 6350 moves, allowing 50 moves between 
each pawn move or capture. The longest tournament game on record between 
‘masters lasted 168 moves, and the shortest four moves. (Chernev, Curious 
Chess Facts, The Black Knight Press, 1937.) 


U.2 


260 Claude E. Shannon on 


104° variations to be calculated from the initial position. A machine 
operating at the rate of one variation per micro-microsecond would 
require over 10% years to calculate its first move ! 

Another (equally impractical) method is to have a ‘“ dictionary ” of 
all possible positions of the chess pieces. For each possible position there 
is an entry giving the correct move (either calculated by the above 
‘ process or supplied by a chess master). At the machine’s turn to move 
it merely looks up the position and makes the indicated move. The 
number of possible positions, of the general order of 64! | 32! 8 !? 2 !8 or 
roughly 104%, naturally makes such a design unfeasible. 

It is clear then that the problem is not that of designing a machine 
to play perfect chess (which is quite impractical) nor one which merely 
plays legal chess (which is trivial). We would like it to play a_ skilful 
game, perhaps comparable to that of a good human player. 

A strategy for chess may be described as a process for choosing a move 
in any given position. If the process always chooses the same move in. 


LOG\o (NUMBER OF LEGAL MOVES) 


fe) 10 20 30 40 50 60. 70 
MOVE IN GAME 


the same position the strategy is known in the theory of games as a. 
“pure” strategy. If the process involves statistical elements and does. 
not always result in the same choice it is a ‘ mixed” strategy. The 
following are simple examples of strategies :— 


(1) Number the possible legal moves in the position P, according to 
some standard procedure. Choose the first on the list. This is a pure 
strategy. 

(2) Number the legal moves and choose one at random from the list. 
This is a mixed strategy. 

Both of these, of course, are extremely poor strategies, making no 
attempt to select good moves. Our problem is to develop a tolerably good. 
strategy for selecting the move to be made. 


Programming a Computer for Playing Chess 261 


3. APPROXIMATE EVALUATING FUNCTIONS. 


Although in chess there is no known simple and exact evaluating 
function f(P), and probably never will be because of the arbitrary and 
complicated nature of the rules of the game, it is still possible to perform 
an approximate evaluation of a position. Any good chess player must, 
in fact, be able to perform such a position evaluation. Evaluations are 
based on the general structure of the position, the number and kind of 
Black and White pieces, pawn formation, mobility, etc. These evaluations 
are not perfect, but the stronger the player the better his evaluations. 
Most of the maxims and principles of correct play are really assertions 
about evaluating positions, for example :— 


(1) The relative values of queen, rook, bishop, knight and pawn are 
about 9, 5, 3, 3, 1, respectively. Thus other things being equal (!) if we 
add the numbers of pieces for the two sides with these coefficients, the 
side with the largest total has the better position. 

(2) Rooks should be placed on open files. This is part of a more 
general principle that the side with the greater mobility, other things 
equal, has the better game. 

(3) Backward, isolated and doubled pawns are weak. 

(4) An exposed king is a weakness (until the end game). 


These and similar principles are only generalizations from empirical 
evidence of numerous games, and only have a kind of statistical validity. 
Probably any chess principle can be contradicted by particular counter 
examples. However, from these principles one can construct a crude 
evaluation function. The following is an example :— 
f(P)=200(K—K’)+9(Q—Q’)+5(R—R’)+3(B—B’+N—N’)+(P—P’) 

—-5(D—D’+S8—S’+I—l’)+-1(M—M’)+ ... 
in which :— 

K, Q, R, B, N, P are the number of White kings, queens, rooks, bishops, 
knights and pawns on the board. 

D, 8S, I are doubled, backward and isolated White pawns. 

M=White mobility (measured, say, as the number of legal moves 
available to White). 

Primed letters are the similar quantities for Black. 

The coefficients -5 and -1 are merely the writer’s rough estimate. 
Furthermore, there are many other terms that should be included *. 
The formula is given only for illustrative purposes. Checkmate has been 
artificially included here by giving the king the large value 200 (anything 
ereater than the maximum of all other terms would do). 

It may be noted that this approximate evaluation f(P) has a more 
or less continuous range of possible values, while with an exact evaluation 


there are only three possible values. This is as it should be. In practical 
Se ee err 
* See Appendix I. 


262 Claude E. Shannon on 


play a position may be an ‘“‘easy win” if a player is, for example, a 
queen ahead, or a very difficult win with only a pawn advantage. In 
the former case there are many ways to win while in the latter exact 
play is required, and a single mistake often destroys the advantage. 
The unlimited intellects assumed in the theory of games, on the other 
hand, never make a mistake and the smallest winning advantage is as. 
good as mate in one. A game between two such mental giants, Mr. A. 
and Mr. B, would proceed as follows. They sit down at the chessboard, 
draw for colours, and then survey the pieces for a moment. Then either 

(1) Mr. A says, “I resign ”’ or 

(2) Mr. B says, “I resign ”’ or 

(3) Mr. A says, ‘I offer a draw,” and Mr. B replies, “I accept.” 


4. SrRaTEGY BASED ON AN EVALUATION FUNCTION. 


A very important point about the simple type of evaluation function: 
given above (and general principles of chess) is that they can only be 
applied in relatively quiescent positions. For example, in an exchange 
of queens White plays, say, Q x Q (x =captures) and Black will reply 
P x Q. It would be absurd to calculate the function f(P) after Q x Q 
while White is, for a moment, a queen ahead, since Black will imme- 
diately recover it. More generally it is meaningless to calculate an. 
evaluation of the general type given above during the course of a com- 
bination or a series of exchanges. 

More terms could be added to f(P) to account for exchanges in progress, 
but it appears that combinations, and forced variations in general, are 
better accounted for by examination of specific variations. This is, in 
fact, the way chess players calculate. A certain number of variations. 
are investigated move by move until a more or less quiescent position 
is reached and at this point something of the nature of an evaluation is. 
applied to the resulting position. The player chooses the variation 
leading to the highest evaluation for him when the opponent is assumed. 
to be playing to reduce this evaluation. 

The process can be described mathematically. We omit at first the 
fact that f(P) should only be applied in quiescent positions. A strategy 
of play based on f(P) and operating one move deep is the following. 
Let M,, M,, Ms, . . ., M, be the moves that can be made in position P 
and let M,P, M,P, etc. denote symbolically the resulting positions when 
M,, M,, etc. are applied to P. Then one chooses the M,,, which maximizes 
f a ba . 

A deeper strategy would consider the opponent’s replies. Let M,,, 
Mj, ..., M;, be the possible answers by Black, if White chooses move 
M;. Black should play to minimize f(P). Furthermore, his choice occurs 
after White’s move. Thus, if White plays M; Black may be assumed to 
play the M,; such that 


/(M,;M,P) 


Programming a Computer for Playing Chess 263. 


is a minimum. White should play his first move such that fis a maximum 


after Black chooses his best reply. Therefore, White should play to 
maximize on M, the quantity 


min f(M,,M,P). 
M.. 


ay 

The mathematical process involved is shown for a simple case in Fig. 2. 
The point at the left represents the position being considered. It is 
assumed that there are three possible moves for White, indicated by 
the three solid lines, and if any of these is made there are three. possible 
moves for Black, indicated by the dashed lines. The possible positions 
after a White and Black move are then the nine points on the right, and 
the numbers are the evaluations for these positions. Minimizing on the 
upper three gives -+--1 which is the resulting value if White chooses the 


Fig. 2. 


upper variation and Black replies with his best move. Similarly, the 
second and third moves lead to values of —7 and —6. Maximizing on 
White’s move, we obtain +-1 with the upper move as White’s best choice. 

In a similar way a two-move strategy (based on considering all variations. 
out to 2 moves) is given by 


Max Min Max Min f(M;;,, Mj; M;; M; P) 
M; My; Miz Mijn a cee tcl) 


The order of maximizing and minimizing this function is important. It 
derives from the fact that the choices of moves occur in a definite order. 

A machine operating on this strategy at the two-move level would 
first calculate all variations out to two moves (for each side) and the 
resulting positions. The evaluations f(P) are calculated for each of these 
positions. Fixing all but the last Black move, this last is varied and the 
move chosen which minimizes f. This is Black’s assumed last move in 


264 Claude E. Shannon on 


the variation in question. Another move for White’s second move is 
chosen and the process repeated for Black’s second move. This is done 
for each second White move and the one chosen giving the largest final f 
(after Black’s best assumed reply in each case). In this way White’s 
second move in each variation is determined. Continuing in this way 
the machine works back to the present position and the best first White 
move. This move is then played. This process generalizes in the obvious 
way for any number of moves. 

A strategy of this sort, in which all variations are considered out toa . 
definite number of moves and the move then determined from a formula 
such as (1) will be called a type A strategy. The type A strategy has 
certain basic weaknesses, which we will discuss later, but is conceptually 
simple, and we will first show how a computer can be programmed for 
such a strategy. 


5. PROGRAMMING A GENERAL PURPOSE COMPUTER FOR A 
Type A STRATEGY. 
We assume a large-scale digital computer, indicated schematically in 
Fig. 3, with the following properties :— 


Fig. 3. 


MEMORY 


PROGRAM 
ARITHMETIC 
DEVICE 
CONTROL 


(1) There is a large internal memory for storing numbers. The memory 
is divided into a number of boxes 2ach capable os holding, say, a ten- digit 
number. Each box is assigned a ** box number ’ 

(2) There is an arithmetic organ which can perfor the elementary 
operations of addition, multiplication, ete. 

(3) The computer operates under the control of a “ program ”’ 
The program consists of a sequence of elementary “orders”. <A 
typical order is A 372, 451, 133. This means, extract the contents of 
box 372 and of box 451, add these numbers, and put the sum in box 133. 
Another type of order involves a decision, for example, C 291, 118, 345. 
This tells the machine to compare the contents of box 291 and 118. If 
the first is larger the machine goes on to the next order in the program. 
If not, it takes its next order from box 345. This type of order enables 
the machine to choose from alternative procedures, depending on the 
results of previous calculations. It is assumed that orders are available 
for transferring numbers, the arithmetic operations, and decisions. 


RESULTS 


a 
4 


CALCULATIONS 


Programming a Computer for Playing Chess 265 


Our problem is to represent chess as numbers and operations on 
numbers, and to reduce the strategy decided upon to a sequence of 
computer orders. We will not carry this out in detail but only outline 
the programs. As a colleague puts it, the final program for a 
computer must be written in words of one microsyllable. 

The rather Procrustean tactics of forcing chess into an arithmetic 
computer are dictated by economic considerations. Ideally, we would 
like to design a special computer for chess containing, in place of the 
arithmetic organ, a “chess organ ”’ specifically designed to perform the 
simple chess calculations. Although a large improvement in speed of 
operation would undoubtedly result, the initial cost of computers seems 
to prohibit such a possibility. It is planned, however, to experiment with 
a simple strategy on one of the numerical computers now being 
constructed. 


Fig. 4. 


BLACK 


CODE FOR PIECES 


ee Neg Ete Ghai 
WHITE 1 Zhe Se AS 6 
BEAGCKSIAl 1-2 4-3 -4:955: 7-6 
OQ = EMPTY SQUARE 


CODE FOR MOVE 


(OLD SQUARE, NEW SQUARE, 
NEW PIECE (IF PROMOTION) ) 


P -K4—-(14, 34,-) 
P - K8(Q)-» (64, 74, 5) 


A game of chess can be divided into three phases, the opening, the 
middle game, and the end game. Different principles of play apply 
in the different phases. In the opening, which generally lasts for about 
ten moves, development of the pieces to good positions is the main 
objective. During the middle game tactics and combinations are pre- 
dominant. This phase lasts until most of the pieces are exchanged, 
leaving only kings, pawns and perhaps one or two pieces on each side. 
The end game is mainly concerned with pawn promotion. Exact timing 
and such possibilities as “ Zugzwang ”’, stalemate, etc. become important. 

Due to the difference in strategic aims, different programs should 
be used for the different phases of a game. We will be chiefly concerned 
with the middle game and will not consider the end game at all. There 
‘seems no reason, however, why an end game strategy cannot be designed 
and programmed equally well. 


266 Claude E. Shannon on 


A square on a chessboard can be occupied in 13 different ways: either 
it is empty (0) or occupied by one of the six possible kinds of White 
pieces (P=1, N=2, B=3, R=4, Q=5, K=6) or one of the six possible 
Black pieces (P=—1, N=—2,..., K=—6). Thus, the state of a square 
is specified by giving an integer from —6 to +6. The 64 squares can be 
numbered according to a co-ordinate system as shown in Fig. 4. The 
position of all pieces is then given by a sequence of 64 numbers each 
lying between —6 and +6. A total of 256 bits (binary digits) is sufficient 
memory in this representation. Although not the most efficient encoding, 
it is a convenient one for calculation. One further number A will be 
+1 or —1 according as it is White’s or Black’s move. A few more should 
be added for data relating to castling privileges (whether the White or 
Black kings and rooks have moved), and en passant captures (@.g., a 
statement of the last move). We will neglect these, however. In this. 
notation the starting chess position is given by :— 


ee Tae: ee Wea Re os a ee Re Pegi Se Th) Ti ph I 

O52) 05" Oe 20; 30.4010, 520s Os we 0; 2. 0.tens Ole a) meen) ae 

O528:0, 305 60, 0,28 0, Rg Oaeeey 0, 0 0.eNC0e Geen OammeO aie One 
Be eee |e] ee fe ea eel eee 
ei): 

A move (apart from castling and pawn promotion) can be specified by 
giving the original and final squares occupied by the moved piece. Each 
of these squares is a choice from 64, thus 6 binary digits each is sufficient, 
a total of 12 for the move. Thus the initial move P—K4 would be repre- 
sented by 1,4; 3,4. To represent pawn promotion a set of three binary 
digits can be added specifying the piece that the pawn becomes. Castling 
is described by giving the king move (this being the only way the king 
can move two squares). Thus, a move is represented by (a, 6, c) where 
a and 6 are squares and ¢ specifies a piece in case of promotion. 

The complete program for a type A strategy consists of nine sub- 
programs which we designate Ty, T,, ..., Ts and a master program. 
Ty. The basic functions of these programs are as follows :-— 

T o—Makes move (a, b, c) in position P to obtain the resulting position. 

T,—Makes a list of the possible moves of a pawn at square (a, y) in 

position P. 

T,, ..., Tg—Similarly for other types of pieces: knight, bishop, rook, 

queen and king. 

T,—Makes list of all possible moves in a given position. 

T,—Calculates the evaluating function f(P) for a given position P. 

Ty,—Master program ; performs maximizing and minimizing calcula- 

tion to determine proper move. 


) 


. 


? 


—4. 1929 Rees pe 


With a given position P and a move (a, 0, c) in the internal memory’ 
of the machine it can make the move and obtain the resulting position 
by the following program T). 


Programming a Computer for Playing Chess 267 


(1) The square corresponding to number a in the position is located 
in the position memory. 

(2) The number in this square x is extracted and replaced by 0 (empty). 

(3) (a) If v=1, and the first co-ordinate of a is 6 (White pawn being 
promoted) or if ~=—1, and the first co-ordinate of a is 1 (Black pawn 
being promoted), the number c is placed in square b (replacing whatever 
was there). : 

(6) If ~=6 and a-b=2 (White castles, king side) 0 is placed in 
squares 04 and 07 and 6 and 4 in squares 06 and 05, respectively. Similarly 
for the cases = 6, b-a=2 (White castles, queen side) and w= —6, a-b=+2 
(Black castles, king or queen side). 

(c) In all other cases, x is placed in square b. 

(4) The sign of A is changed. 

For each type of piece there is a program for determining its possible 
moves. As a typical example the bishop program, T3, is briefly as 
follows. Let (x, y) be the co-ordinates of the square occupied by the. 
bishop. 

(1) Construct (v+1, y+1) and read the contents w of this square in. 
the position P. 

(2) If w=0 (empty) list the move (x, y), (v+1, y+1) and start over 
with (w+2, y+2) instead of (w+1, y+1). 

If Au is positive (own piece in the square) continue to 3. 

If Au is negative (opponent’s piece in the square) list the move and. 
continue to 3. 

If the square does not exist continue to 3. 

(3) Construct (w7+1, y—1) and perform similar calculation. 

(4) Similarly with (vz—1, y+1). 

(5) Similarly with (7—1, y—1). 

By this program a list is constructed of the possible moves of a 
bishop in a given position P. Similar programs would list the moves. 
of any other piece. There is considerable scope for opportunism in 
simplifying these programs; e¢.g., the queen program, T;, can be a 
combination of the bishop and rook programs, T, and T,. 

Using the piece programs T,...T, and a controlling program. 
T, the machine can construct a list of all possible moves in any given 
position P. The controlling program T, is briefly as follows (omitting 
details) :— 

(1) Start at square 1,1 and extract contents 2. 

(2) If Aw is positive start corresponding piece program T,, and when 
complete return to (1) adding 1 to square number. If Az is zero or negative, 
return to 1 adding 1 to square number. 

(3) Test each of the listed moves for legality and discard those which 
are illegal. This is done by making each of the moves in the position iE 
(by program Ty) and examining whether it leaves the king in check. 


268 Claude E. Shannon on 


With the programs T,...T, it is possible for the machine to play 
legal chess, merely making a randomly chosen legal move at each turn 
to move. The level of play with such a strategy is unbelievably bad.* The 
writer played a few games against this random strategy and was able 
to checkmate generally in four or five moves (by fool’s mate, etc.). The 
following game will illustrate the utter purposelessness of random play :— 


White (Random) Black 
(1) P-KN3 P-K4 
(2) P-Q3 B-B4 
(3) Be Q2 Q-B3 
(4) N-QB3 Q x P mate 


We now return to the strategy based on an evaluation f(P). The 
program T, performs the function of evaluating a position according 
to the agreed-upon f(P). This can be done by the obvious means of 
scanning the squares and adding the terms involved. It is not difficult to 
include terms such as doubled pawns, etc. 

The final master program Ty, is needed to select the move according 
to the maximizing and minimizing process indicated above. On the 
basis of one move (for each side) T, works as follows :— 


(1) List the legal moves (by T,) possible in the present position. 


(2), Take the first in the list and make this move by To, giving position 
M, 


ac) 


(3) List the Black moves in M,P. 

(4) Apply the first one giving M,,M,P, and evaluate by T,. 

(5) Apply the second Black move M,, and evaluate. 

(6) Compare, and reject the move with the smaller evaluation. 

(7) Continue with the third Black move and compare with the retained 
value, ete. 

(8) When the Black moves are exhausted, one will be retained together 
with its evaluation. The process is now repeated with the second White 
‘move. 


(9) The final evaluations from these two computations are compared 
and the maximum retained. 

(10) This is continued with all White moves until the best is selected 
(v7. e. the one remaining after all are tried). This is the move to be made. 


* Although there is a finite probability, of the order of 10-75, that random 
play would win a game from Botvinnik. Bad as random play is, there are even 
worse strategies which choose moves which actually aid the opponent. For 
example, White’s strategy in the following game: 1. P-KB3, P-K4. 
:2. P-KN4, Q-R5 mate. 


Programming a Computer for Playing Chess 269- 


These programs are, of course, highly iterative. For that reason they 
should not require a great deal of program memory if efficiently worked out. 

The internal memory for positions and temporary results of calculations. 
when playing three moves deep can be estimated. Three positions should 
probably be remembered : the initial position, the next to the last, and 
the last position (now being evaluated). This requires some 800 bits. 
Furthermore, there are five lists of moves each requiring about 
30 x 12=360 bits, a total of 1800. Finally, about 200 bits would cover 
the selections and evaluations up to the present calculation. Thus, 
some 3000 bits should suffice. 


6. IMPROVEMENTS IN THE STRATEGY. 


Unfortunately a machine operating according to this type A strategy 
would be both slow and a weak player. It would be slow since even if 
each position were evaluated in one microsecond (very optimistic) there 
are about 10° evaluations to be made after three moves (for each side). 
Thus, more than 16 minutes would be required for a move, or 10 hours. 
for its half of a 40-move game. 

It would be weak in playing skill because it is only seeing three moves 
deep and because we have not included any conditions about quiescent 
positions for evaluation. The machine is operating in an extremely 
inefficient fashion—it computes all variations to exactly three moves and 
then stops (even though it or the opponent be in check). A good human 
player examines only a few selected variations and carries these out to. 
a reasonable stopping-point. A world champion can construct (at best) 
combinations say, 15 or 20 moves deep. Some variations given by 
Alekhine (“‘My Best Games of Chess 1924-1937”) are of this length. 
Of course, only a few variations are explored to any such depth. In 
amateur play variations are seldom examined more deeply than six or 
eight moves, and this only when the moves are of a highly forcing nature 
(with very limited possible replies). More generally, when there are few 
threats and forceful moves, most calculations are not deeper than one or 
two moves, with perhaps half-a-dozen forcing variations explored to: 
three, four or five moves. 

On this point a quotation from Reuben Fine (Fine 1942), a leading: 
American master, is interesting: ‘‘ Very often people have the idea that 
masters foresee everything or nearly everything ; that when they played. 
P_R3 on the thirteenth move they foresaw that this would be needed 
to provide a loophole for the king after the complications twenty moves. 
later, or even that when they play 1 P—K4 they do it with the idea of 
preventing Kt-Q4 on Black’s twelfth turn, or they feel that everything 
is mathematically calculated down to the smirk when the Queen’s Rook 
Pawn queens one move ahead of the opponent’s King’s Knight’s Pawn. 
All this is, of course, pure fantasy. The best course to follow is to note: 
the major consequences for two moves, but try to work out forced 


variations as they go.”’ 


270 Claude E. Shannon on 


The amount of selection exercised by chess masters in examining 
possible variations has been studied experimentally by De Groot (1946, b). 
He showed various typical positions to chess masters and asked them 
to decide on the best move, describing aloud their analyses of the positions 
as they thought them through. In this manner the number and depth 
of the variations examined could be determined. Fig. 5 shows the result 
of one such experiment. In this case the chess master examined sixteen 
variations, ranging in depth from 1/2 (one Black move) to 4-1/2 (five 
Black and four White) moves. The total number of positions considered 
was 44. 


Fig. 5. 
@——-—-@ WHITE MOVES 
o—— BLACK MOVES 
—---- Se —_*® 
-----o__* 
i | 
—----o——_* - — - - -»—___#-----« 
------ 
vSN 
VS 
se Sie 
sy 
\ P_—— 2 
\ os Ke 
‘ pi Lae 
Sy —_ ———-9@—————_6@ 
\ 
--—---@————-6 
----~__¢ -----© 
—--—-~o-____« 
----~o—____¢- -----@ 
8 WwW B WwW B Ww B WwW B 


_ From these remarks it appears that to improve the speed and strength 
-of play the machine must :— 

(1) Examine forceful variations out as far as possible and evaluate 
only at reasonable positions, where some quasi-stability has been 
established. 

(2) Select the variations to be explored by some process so that the 
machine does not waste its time in totally pointless variations. 

A strategy with these two improvements will be called a type B 
strategy. It is not difficult to construct programs incorporating these 
features. For the first we define a function g(P) of a position which 
determines whether approximate stability exists (no pieces: en prise, 
etc.). A crude definition might be : 


Programming a Computer for Playing Chess 271 


I if any piece is attacked by a piece of lower value, or by more 
gost pieces then defences or if any check exists on a square 
controlled by opponent. 
0 otherwise. 


Using this function, variations could be explored until g(P)=0, always, 
however, going at least two moves and never more than say, 10. 

The second improvement would require a function h(P, M) to decide 
whether a move M in position P is worth exploring. It is important that 
this preliminary screening should not eliminate moves which merely look 
bad at first sight, for example, a move which puts a piece en prise; 
frequently such moves are actually very strong since the piece cannot 
be safely taken. 

“ Always give check, it may be mate” is tongue-in-cheek advice 
given to beginners aimed at their predilection for useless checks. ‘‘ Always 
investigate a check, it may lead to mate ”’ is sound advice for any player. 
A check is the most forceful type of move. The opponent’s replies are 
highly limited—he can never answer by counter attack, for example. 
This means that a variation starting with a check can be more readily 
calculated than any other. Similarly captures, attacks on major pieces, 
threats of mate, etc. limit the opponent’s replies and should be calculated 
whether the move looks good at first sight or not. Hence, h(P, M) should 
be given large values for all forceful moves (checks, captures and attacking 
moves), for developing moves, medium values for defensive moves, and 
low values for other moves. In exploring a variation h(P, M) would be 
calculated as the machine computes and would be used to select the 
variations considered. As it gets further into the variation the require- 
ments on h/ are set higher so that fewer and fewer subvariations are 
examined. Thus, it would start considering every first move for itself, 
only the more forceful replies, etc. By this process its computing efficiency 
would be greatly improved. 

It is believed that an electronic computer incorporating these two 
improvements in the program would play a fairly strong game, at 
speeds comparable to human speeds. It may be noted that a machine 
has several advantages over humans :— 

(1) High-speed operation in individual calculations. 

(2) Freedom from errors. The only errors will be due to deficiencies 
of the program while human players are continually guilty of very 
simple and obvious blunders. 

(3) Freedom from laziness. It is all too easy for a human player to 
make instinctive moves without proper analysis of the position. 

(4) Freedom from ‘nerves’. Human players are prone to blunder 
due to over-confidence in ‘‘ won’ positions or defeatism and self- 
recrimination in “‘ lost ”’ positions. 

These must be balanced against the flexibility, imagination and 
inductive and learning capacities of the human mind. 


O72 Claude E. Shannon on 


Incidentally, the person who designs the program can calculate 
the move that the machine will choose in any position, and thus in a 
sense can play an equally good game. In actual fact, however, the 
calculation would be impractical because of the time required. On a 
fair basis of comparison, giving the machine and the designer equal 
time to decide on a move, the machine might well play a stronger game. 


7, VARIATIONS IN PLAY AND IN STYLE. 


As described so far the machine once designed would always make the 
same move in the same position. If the opponent made the same moves 
this would always lead to the same game. It is desirable to avoid this, 
since if the opponent wins one game he could play the same variation — 
and win continuously, due perhaps to some particular position arising 
in the variation where the machine chooses a very weak move. 

One way to prevent this is to have a statistical element in the machine. 
Whenever there are two or more moves which are of nearly equal value 
according to the machine’s calculations it chooses from them at random. 
In the same position a second time it may then choose another in the set. 

The opening is another place where statistical variation can be intro- 
duced. It would seem desirable to have a number of the standard 
openings stored in a slow-speed memory in the machine. Perhaps a few 
hundred would be satisfactory. For the first few moves (until either the 
opponent deviates from the “ book” or the end of the stored variation 
is reached) the machine plays by memory. This is hardly “ cheating ” 
since that is the way chess masters play the opening. 

It is interesting that the “ style ’’ of play of the machine can be changed 
very easily by altering some of the coefficients and numerical factors. 
involved in the evaluation function and the other programs. By 
placing high values on positional weaknesses, etc. a positional-type player 
results. By more intensive examination of forced variations it becomes. 
a combination player. Furthermore, the strength of the play can be 
easily adjusted by changing the depth of calculation and by omitting 
or adding terms to the evaluation function. 

Finally we may note that a machine of this type will play “ brilliantly ”” 
up to its limits. It will readily sacrifice a queen or other piece in order 
to gain more material later or to give checkmate provided the completion 
of the combination occurs within its computing limits. 

The chief weakness is that the machine will not learn by its mistakes. 
The only way to improve its play is by improving the program. 
Some thought has been given to designing a program which is self- 
improving but, although it appears to be possible, the methods thought 
of so far do not seem to be very practical. One possibility is to have a 
higher level program which changes the terms and coefficients involved 
in the evaluation function depending on the results of games the machine 
has played. Small variations might be introduced in these terms and. 
the values selected to give the greatest percentage of ‘“ wins ”’. . 


Programming a Computer for Playing Chess 273 


8. ANOTHER TYPE OF STRATEGY. 

The strategies described above do not, of course, exhaust the possi- 
bilities. In fact, there are undoubtedly others which are far more efficient 
in the use of available computing time on the machine. Even with the 
improvements we have discussed the above strategy gives an impression 
of relying too much on “ brute force ”’ calculations rather than on logical 
analysis of a position. It plays something like a beginner at chess who 
has been told some of the principles and is possessed of tremendous 
energy and accuracy for calculation but has no experience with the game. 
A chess master, on the other hand, has available knowledge of hundreds 
or perhaps thousands of standard situations, stock combinations, and 
‘common manceuvres which occur time and again in the game. There 

_are, for example, the typical sacrifices of a knight at B7 or a bishop at 
R7, the standard mates such as the ‘“ Philidor Legacy”, manceuvres 
based on pins, forks, discoveries, promotion, etc. In a given position 

_ he recognizes some similarity to a familiar situation and this directs his 
mental calculations along lines with greater probability of success. 

There is no reason why a program based on such “type positions ”’ 
could not be constructed. This would require, however, a rather for- 
midable analysis of the game. Although there are various books analysing 
combination play and the middle game, they are written for human 
consumption, not for computing machines. It is possible to give a person 
one or two specific examples of a general situation and have him under- 
stand and apply the general principle involved. With a computer an 
exact and completely explicit characterization of the situation must be 
given with all limitations, special cases, etc. taken into account. We 
are inclined to believe, however, that if this were done a much more 
efficient program would result. 

To program such a strategy we might suppose that any position 
an the machine is accompanied by a rather elaborate analysis of the 
tactical structure of the position suitably encoded. This analytical data 
will state that, for example, the Black knight at B3 is pinned by a bishop, 
that the White rook at K1 cannot leave the back rank because of a 
threatened mate on B8, that a White knight at R4 has no move, etc. ; in 
short, all the facts to which a chess player would ascribe importance in 
analysing tactical possibilities. These data would be supplied by a 
program and would be continually changed and kept up-to-date as 

the game progressed. The analytical data would be used to trigger 
various other programs depending on the particular nature of the 
position. A pinned piece should be attacked. If a rook must guard 
the back rank it cannot guard the pawn in front of it, etc. The machine 
obtains in this manner suggestions of plausible moves to investigate. 

It is not being suggested that we should design the strategy in our own 
image. Rather it should be matched to the capacities and weaknesses 
of the computer. The computer is strong in speed and accuracy and 
-weak in analytical ability and recognition. Hence, it should make more 


SER. 7, VOL. 41, NO. 314.—MARCH 1950 x 


274. Claude E. Shannon on 


use of brutal calculations than humans, but with possible variations 
increasing by a factor of 10? every move, a little selection E088 a long 
way toward improving blind trial and error. 


ACKNOWLEDGMENT. 


The writer is indebted to E.G. Andrews, L. N. Enequist and H. E. 
Singleton for a number of suggestions that have been incorporated in 
the paper. 


October 8, 1948. 


APPENDIX. 


THE EVALUATION FUNCTION FOR CHESS. 


The evaluation function f(P) should take into account the “ long term ’ 
advantages and disadvantages of a position, 7. ¢. effects which may be 
expected to persist over a number of moves longer than individual 
variations are calculated. Thus the evaluation is mainly concerned with 
positional or strategic considerations rather than combinatorial or tactical 
ones. Of course there is no sharp line of division; many features of a 
position are on the borderline. It appears, however, that the following 
might properly be included in f(P) :— 

(1) Material advantage (difference in total material). 

(2) Pawn formation : 

(a) Backward, isolated and doubled pawns. 
(b) Relative control of centre (pawns at K4, Q4, B4). 
(c) Weakness of pawns near king (e. gy. advanced KNP). 
(d) Pawns on opposite colour squares from bishop. 
(e) Passed pawns. 
(3) Positions of pieces : 

(a) Advanced knight (at K5, Q5, B5, K6, Q6, B6), especially if 

protected by pawn and free from pawn attack. 

(b) Rook on open file, or semi-open file. 

(c) Rook on seventh rank. 

(d) Doubled rooks. 

(4) Commitments, attacks and options : 

(a) Pieces which are required for guarding functions and, there- 

fore, committed and with limited mobility. 

(b) Attacks on pieces which give one player an option of 

exchanging. 

(c) Attacks on squares adjacent to king. 

(7) Pins. We mean here immobilizing pins where the pinned. 
piece is of value not greater than the pinning piece; for 
example, a knight pinned by a bishop. 

(5) Mobility. 


Programming a Computer for Playing Chess 275 


These factors will apply in the middle game: during the opening and 
end game different principles must be used. The relative values to be 
given each of the above quantities is open to considerable debate, and 
should be determined by some experimental procedure. There are also 
numerous other factors which may well be worth inclusion. The more 
violent tactical weapons, such as discovered checks, forks and pins by a 
piece of lower. value are omitted since they are best pceOUnbed for by 
the examination of specific variations. 


REFERENCES. 


CHERNEV, 1937, Curious Chess Facts, The Black Knight Press. 

Dz Groot, A. D. ,1946a, Het Denken van den Schaker 17-18, epsperd aie 1946b, 
Ibid., Amsterdam, 207. 

Five, R., "1942, Chess the Easy Way, 79, David McKay. 

Harpy and Wricut, 1938, The Theory of Numbers, 116, Oxford. 

Von Neumann and Morcenstern, 1944, Theory of Games, 125, Princeton. 

Vieneron, H., 1914, Les Automates, La Natura. 

WIENER, N., 1948, Cybernetics, John Wiley. 


<a 


XXIII. Propagation of Electromagnetic Disturbances along a Thin Wire 
in a Horizontally Stratified Medium. 


By B. L. Cotemay, B.Sc., 
British Electrical and Allied Industries Research Association *. 


[Received October 11, 1949.] 


SUMMARY. 

It is shown that for a thin wire buried at finite depth in a semi-infinite 
homogeneous medium and subject to a disturbance at a given frequency, 
there exists an exponential attenuation of the current in the wire with a 
propagation constant equal to that of the medium. This result is extended 
to a medium with a number of layers, and to a wire lying in an interface. 


List oF PRINCIPAL SYMBOLS. 


All quantities are in M.K.S. rationalized units (Stratton 1941). Time- 
wise variation is as e”, this factor being omitted for brevity: 


as Y he Corresponding cartesian and cylindrical polar sooniiaats systems. 

o =Conductivity of medium. 

i = Permeability of medium. 

€ =Permittivity of medium. 

w =27 x frequency. 

y = Propagation constant of medium. [jwpu(co+jwe)]?. 

I'(w)  =Propagation constant of disturbance along the wire, at angular 
pulsation w. 

a = Radius of wire. 

h ==Height of wire above interface. 

V(x) =Potential at x with respect to that at a point at infinity. 

E(x) = Potential gradient at x. 

a 

E =Electric intensity vector ; components E,, E,, E,. 

H = Magnetic intensity vector ; components H,, H,,, H,. 

II Hertzian vector potential ; components IT,, II,, I:. 

I(x) =Current in wire at x. 

9g Complex leakage conductance per unit length. 

ZL Complex internal impedance per unit length. 

J,(z) —=Bessel function of first kind of order n (McLachlan 1941). 

K,,(z) =Modified Bessel function of second kind of order n?. 

UTA ey variables. 

aX =(u2+y")F. 

P =2ryf. 


The convention is used throughout that the root sign, as in (w2+-y2)!, 
or in (A?)!, shall mean that root whose real part is non- negative. Sub- 
scripts 0 and 1 refer to the upper and lower media respectively. 


* Communicated by Dr. 8S. Whitehead. 


I 


On the Electromagnetic Disturbances along a Thin Wire 27 


§1. InrRopDUCTION. 


THE problem of propagation along a buried wire has received attention 
m recent years in connection with the lightning protection of overhead 
transmission lines by buried wires (counterpoise) (Sunde 1940), and in the 
design of protective methods for buried power and telephone cables 
(Sunde 1945). 

The propagation of surges can be calculated from steady state propagation 
by the use of operational methods such as the Fourier Integral (Campbell 
and Foster), so that it is necessary to solve only the latter case. Assume 
a current at a given frequency to be injected into the wire at a given 
point ; as it proceeds along the wire it leaks to earth, and an associated 
system of travelling electromagnetic waves is set up in accordance with 
Maxwelliantheory. Ifthe wire, which is assumed infinite in both directions, 
is buried at infinite depth in an homogeneous medium, the attenuation 
of the current follows the law : 


T(a)=+41(0)exp{—y|a}, . . . 2. . (1) 


where y is the propagation constant of the medium. The problem is 
relatively simple compared with that discussed here, because of the 
inherent symmetry of the system. The simpler result has been extended 
to the case of a wire lying in the interface between two media of different 
characteristics, assuming the conductivity and permittivity of the upper 
medium (air) are negligible, and the result is 


Ie)=+H(O)exp{—y|2|/V2} . . ~~. (2) 


where y is the propagation constant of the lower medium (earth). It 
has been further suggested (Sunde 1945) that in this case, for moderate 
depths of burial of the wire, propagation is given approximately by 
equation (2). However, the method of proof of these results is open to 
criticism on mathematical grounds. 

This paper deals with the general question of propagation along a wire 
buried at a finite depth in an earth consisting of a number of horizontal 
stratifications or layers, of which the air can be considered as one, of 
individual characteristics o, «4, «, each or all of which may differ between 
the various layers. It is shown that for thin wires there exists an 
exponential mode of propagation of the form exp{—I'|«|}, where the 
propagation constant I” is determined as follows : 


(1) If the wire is totally immersed in a layer of constant y, that is, if 
it is more than a few diameters away from the nearest interface, then [= y. 
! 


(2) If the wire lies at the interface between layers m and m-+1 


> Pont 1¥ x Em Vint 1 é 3 
Ge ence ee inn ee eee, CO} 
(oe Ped 


Writing y,=0, Um=Hm+1 equation (2) follows. 


278 B. L. Coleman on the Propagation of Electromagnetic Disturbances 


The proof as given below is for the case of a two-layer medium, but from 
its methods, and the nature of the result, the above generalization is 
immediate (§5). It follows the methods used by Sunde (1936, 1940), 
extending his results and putting them on a rigorous mathematical 
foundation. 


§ 2. THE FreLp or A DOUBLET. 


The doublet is supposed at height h above the interface between two 
semi-infinite homogeneous media specified by oo, 9, €9 and o1, My, €1, aS 
shown in fig. 1. It is of length ds, in the direction of the x axis with its 
mid-point on the z axis at (0, 0, h), and carrying unit current. If the 


Fig. 1. 


ae re) 


two media were identical its field would be completely specified by a 
Hertzian vector potential everywhere in the x direction of magnitude 


ieee exp{ 4% |2—A]} Udo (ru) du, : r . (4) 
0 Xo 
where 
__ JOH AS 
a dry 


and (r, 0, 2) is the associated cylindrical polar coordinate system. When 
the media are as shown (4) gives the primary field in the upper region, 
and the complete field is obtained by superimposing suitable secondary 
or reflection fields in the wpper and lower media. These reflection fields 
are similarly obtainable from Hertzian vector potentials, and symmetry 
with respect to the x-z plane restricts such vectors to the x and z directions, 
so that the complete field can be specified by 


pa i 2 ” 
79 ¢=T0n +7009 Tz 


Tig » Wiz 


along a Thin Wire in a Horizontally Stratified Medium 279 


in the two media, The secondary components are everywhere finite and 
continuous, since the only permissible singularity is the doublet allowed 
for in 7,, The field intensities are given by 


ee nor. 


The secondary vector components satisfy Poisson’s equation 


Or Or Or 
Baap ee ck, ENR ce se es (7) 


and the appropriate general solution is (Stratton 1941, §6°9; eq. (55)) 
n=+ToO . ee 
a= 2 exp{njo}| [9,(w) exp{—az}+f,(u) exp {az}]uJ,,(ru) du. . (8) 
n=— 0 0 
The boundary conditions at z=0 consist of the continuity of the 


tangential (2, y) components of E and H. From the nature of the 
expansions (4) and (8), these conditions may be differentiated or integrated 
with respect to # and y in the plane z=0, and since the potentials must 
vanish at infinity in this plane the “ functional constants ” of integration 
must vanish. The conditions may therefore be taken as 


20 

2 2 

ore Oz a Ee 12? (9) 
Jo JOP 

ve O79 was yi Ors (10) 
Jopty Oz jJwp, 02 

Oty 4 |, Oz re Om, , OM, 

Ox ¥ Gz = Ox i oz” a 
Veto x eT, ne en eae anne sk 2) 


Taking the appropriate forms of (5), namely 


n=+ 0 © 
T= 2 exp {njo} I, fon(u) exp {—%z}uJ,(ru) du, . (13) 


n=— 0 


280 B. L. Coleman on the Propagation of Electromagnetic Disturbances 

etc., together with (4), equations (10) and (12) give 

” ale Aexp{— Aol 4% Ho%1 
0 


exp {—a,z}ud (ru) du, . (14) 
Op fh qT bgt P {—%z}uJo 


2 
= 2Y0 9 xp {—oh}.— exp {a,z}uJ (ru)du, . (15) 
Te {e we exp {—%h} aes P {x1z}ud 
Noting that 


2 Jy(ru)= —cos dud ,(ru)= — (exp {j0}+exp {—j6})uJ (ru), . (16) 


equations (9) and (11) yield 


0 (* 2yo—7i)HaboA exp {—%h . 
ee eet ee J a OR 0) a) 
ave vo (poy % qt boos) (Yop 1% + yjU9%) Pt g j 
é / ° 2yo(yo—V)HIA exp {—a%h} = 
= 2 cielo aI SEP o's oxpfageyudo(ru) du. (18) 
ane 02 J 9 Y2(M4%tHo%1) (Yoo 1%1 +-ViH0%0) Piaoz}udo 


The electric intensity in the x direction in the plane z=h is given by 


B,=dlP(r)+ Qn), - - Oy, 


where 


P(r)== BEEN = E a pal a oe 2agl) | ude uJo(ru) du, . (20) 
0 


4a at My %qHo% 
Es (Yo- i) JOH HG i a Exp {—2aph}uI (ru) du ae 
WE ons A Jr errr SE WT $$. . (21) 
oO ye aoa 2ry5 (145% 9+ oo%s)(YoH1% +77 Mo%0) 


In the general case of a stratified earth with a number of layers, 
additional terms must be added to the components of the vector potential 
to represent higher order reflection fields, and such terms must again 
satisfy Poisson’s equation and be free from singularities in the layer or 
layers containing the dipole. The effect of such terms on the solution is 
discussed in §5. 


§3. PROPAGATION ALONG AN INFINITE WIRE (Sunde 1936). 


The wire is supposed embedded in the upper medium at height h from 
the interface with its axis along the # axis, and the origin of coordinates. 
is taken at the point on it opposite the centre of the applied distrubance 
(e. g. if current is fed directly into the wire the origin is the point of feed). 
The wire’ is of radius a and internal impedance Z (complex) per unit 
length which is supposed constant along its length for a given frequency. 
To account for imperfect contact it is supposed surrounded by a shell of 
negligible thickness and of complex leakage conductance g per unit 
length, again supposed constant along its length for a given frequency. 

Suppose the applied disturbance gives rise to an electric intensity 
E°(a) along the surface of the wire. 


along a Thin Wire in a Horizontally Stratified Medium 281 


Let V,,(%)=potential of wire at (a, a, h) 
V,,(%)=potential of the medium at an adjacent point. 
I()=total current in the wire at x. 
Then 
ea (2) 
Vie) Ve eg ae es EAT ees 2), 
m(2) w(X) gy dx (2 ) 
Hite) 


ve 


Sa, Mo,E o. 


vo 


6,4, €, 


Let E,,(v)=potential gradient along the wire at (a, a, h) 
E,,(v)=potential gradient in the medium at an adjacent point in: 
the x direction. 


Then 

d?I (a) 
da? 
Now by definition of Z if the wire is sufficiently thin for E,(x) to be 

constant over its surface, E,,(«)=ZI(z). Also E,,(x) can be split into 

two parts, E°(x) the applied gradient and E ,(«) the gradient due to currents. 

in the surrounding media. Hence 


E,,(t)—E,(«)=—97} Be(e2) 


ere St 
Get Oo) _F1(e)+ Blo) =— EM). . Os tee een (24) 
Fig. 3. 
PlaneZ-h 
Y eae 


a elag 
~ é 
x ——4 
It remains to express E,(x) in terms of I(x). With the aid of the- 


results obtained in §2 E,(x) can be calculated in the plane z=A as follows: 
referring to fig. 3 consider the effect of a current element at point (7, 0, /)) 


282 B. L. Coleman on the Propagation of Electromagnetic Disturbances 
-on the electric field at (x, a, h), and let X=a—v, s=(X2+a?)!, Then 
0” : 
dE, («%)=1(r) dr | Po) + axe ae) | 15 SPER, taney 
where the P and Q functions are given by (20) and (21). 
+o + 0 o2 
Ee)=[_— P(s)l(r) r+ Ur) 55 Qe) de 
=T,+T, (say). Me Ve eds hes eg TS Slee 


‘The current can be expressed by a Fourier Integral as 


I(2)= |” FO) exp {jx} [is pe mera 


where A is real on the path of integration. Then 


+0 +0 
«T= drP(s)| F(A)expfjar}dA 2... (28) 


—o 


+0 + 
a | dF (A) { P(s) exp{jAr}dr 


writing 
t=2-+0v, b=(v?+a?)!, 
+ 0 +0 
ee AXE(2)exp{jx} | P(b) exp{jAv} dv. . . (29) 
In a simil r, noting that Boe it foll that 
na similar manner, noting that => = = it follows tha 
+ 00 T.:2 
Teeses | dE (A) exp{ jax}? Q(b) exp{jrv} dv, (30) 
writing 
+o 
p(A, a)=—| P(b) exp{jrv} dv, . . . (32) 
+ 00 : 
(A, a) =| Q(6)-exp {7A0} dv; ie nea oetae) 


and expressing E(x) as a Fourier integral 
H%(e)=[ H(A) exp {9A} da, PNAS ale © eB) 
equation (24) becomes pe 
[= Fopexp( Me} AIG +G0A, «HZ +p, a) dd 
=|" AOvexp {ira} ar “one, WR (3.4) 


ep eg ER 
* For a discussion of the inversion of this and other integrals see Appendix. 


along a Thin Wire in a Horizontally Stratified Medium 283 


Writing 
A(A)=¥[g-1+-q(A, a)|+[Z+p(, a), . 2. (85) 
then 
ro) = 20) 
“Or 
+ hir 
We)=| Foy exp (ie) aa. Steere Sah -(36) 


From equations (20) and (21), P(6) and Q(b) are integrals of the form 
N(b)= i " OOPAHT RT Rake eae eee 
-and hence p(A, a), ¢(A, a) are integrals of the form 
Ga ; N[(v?---a?)?] exp {jAv} dv, 


where as previously remarked, A is a real variable. In accordance with 
the notes in the Appendix it follows that 

[c‘e) + 0 

G= f duF (u?)u | exp {jAv}Jo[u(a?-+-v2)#] dv 
0 — 0 
oy 2 cos a(u*—A?)* 
== 2 a 
- F(u?)u. 2)! du 


=2| F(u?+- A?) cos au du 
0 


+ 00 


=| exp{jau}F(uwta?)du. . . . 2. . ~ (38) 


(These transformations follow from Campbell and Foster, pair 915.4.) 
Hence 


3 ) Fimce 6 2 
pdja)="SE8 |" exp{Jau} (wt ?-+78)4 


pa (ut +6) Holt YY exp {—2h (uP a2 2)! 
x{ + (LIE LB) pg yAye OP A 2h(uP+- d°-+-yo)* ¢ du 
(39) 


‘with a similar expression for q(A, a), these expressions holding for real 
- values of A. Consider them now as functions of a complex variable 4, 
then as |A|—0 


pA, a)> ae : exp {jau}(u?-+ vay a. ean exp {2h (u8-+ 29) ] du 


TT 


= Oe [ Kofa8) | a | yo esr Ky ((eaa+8)) | foray i= (£0) 


284 B. L. Coleman on the Propagation of Electromagnetic Disturbances 


(See Campbell and Foster, pairs 868 and 558 and McLachlan, list of formule 
No. 111.) 

Similarly q(A, a)>0 as A>0. If now E%«) fulfils certain conditions 
(Whittaker and Watson 1940), equation (36) can be evaluated by the 
theory of residues. Suppose the poles of the numerator and zeros of the 
denominator are given by A,, A, (7, k=1, 2, 3... . ) where the integrand is 
now considered as a function of a complex variable A, and 7, k refer to the 
upper and lower halves of the A plane respectively, it follows that 


I(x) =2nj2R; (o°S0) ee ences ee eee 
LD SHER ie ee eo Se 
k 


where R; R,, are the residues of the integrand at );, A, respectively. It 
has been assumed that there are no real poles or roots, but the corresponding 
modifications are simple, and in any case it is shown below that it is not 
necessary in general to consider this possibility. (42) and (43) give the 
formal solutions of the problem and particular cases of interest are 
discussed in the next section. 


§4. THe PrincrpaL MODE OF PROPAGATION. 
(4.1) Wire not in Interface. 
In this case hA0. Consider first the solutions corresponding to the 
zeros of the function 4(A), 7. e. to the roots of 


; Z+-p(A, a) 


P= — =———~—__ .. . . ... . (44 
g*+4Q(A, @) fea? 
pA, a) is given by (39), which can be written 
wW 
pd, a) = oR TOe+y Fat}, . . . (45) 
where 
pal” MOE EB al ED em (— 2A oe gy 
Jo py (UP+ A2 +8) 4+ pg(U? + A? +-y2)4 (u2-+ 2 +-y2)8 


Suppose a to decrease indefinitely ; it is seen by the theory of residues. 
that T remains finite, whilst K,[(A?+-y6)!|a|]+oo, and hence as 
a0, | p(A, a) |0oo. Returning to equations (20) and (21) it follows by 
an exactly similar argument that q(A, a)>p(A, a)/y2, and supposing 


Z and gt finite it follows that the roots are given by 


Me yee kus, PTieagies oot ates am AO) 
and the corresponding component of I(x) is given by 
I(x)>1(0) exp{—yp | x | }, ts ke ee te) 


where in this expression I(0) as obtained by letting x tend to zero may be 
different according as w is +-ve or —ve. For example, in the case of 
current fed into the wire at x=0, it is obvious by symmetry that 
I(0+-)=—I(0—). Furthermore, examination of the corresponding residues. 
R,; and R, shows that (A) must have a singularity at 2=—y2, and. 


along a Thin Wire in a Horizontally Stratified Medium 28 


Or 


this places an important restriction on the type of excitation which will 
‘cause a disturbance in the wire, though it is a restriction which is satisfied 
in many practical cases (Sunde 1936). 

The remaining modes of propagation correspond to the poles of h(A) 
other than these, and it is seen that as a0, 2 Ay2, propagation is prevented 
by the infinity assumed by 4(A), so that for thin wires propagation will 
be exponential with a constant given by [=y, (see § 1(1)). 


(4.2) Wire in the Interface. 

Similar arguments. concerning the possible modes of propagation 
corresponding to the poles of A(X) at values of A other than the roots of 
A(A)=0 still apply, and it is only necessary to consider the roots of this 
equation, viz. (44). In this case h=0 and (20) and (21) now give 

TJeHots (° 
P(b) = | 5s naa 
©) 2a 0 Ho%1 +H 1%0 
Jopoys (°F veri (Yo= VE iHo | 
Q(b) = ead | ae Se laid p(B) dita (49) 
©) 20r(y,MS—YOHT) J 0 Lea%o+Ho%1 YOM FY iMo%o ; 

Proceeding as above it is necessary to evaluate integrals of the form 

il 


uJ (bu) du (48) 


+0 
G= | orlian apE Tp R eT + 00) 
where B) and ay are complex constants. Now 
+ 00 y2—- )2-Ly2)8 yet p2-y2)3 
CSssS B =F te f exp{ jaw} {Ps : ae ee 0) \ aw, 
where 
ip Sa ivi — Bava ; 
Bi —B5 
But 


oe aaa lb eg ag ke ioe Ca 
ike i exp {jau} . IOS URIN du= 21 | sa} exp {27jaf } ena > 
where p=2z7jf, and it can be shown (Campbell and Foster, pairs 558, 438, 
439, 202) that 


eS (r2 
lim| exp {27jaf} ae pe ae ~ K,[? sel (ee eee 


a—>0 


dot a i, —a(2-+e2) HK [(A2+72)!x] da, . . (51) 

Aer CaaS exp{—z +") } ol ( ay 
and since the square root has a non-negative real part the second term 
is finite (Admiralty Computing Service). It follows that as a>0 


pO, a) es “= sz aK ola + 71) esol +70) - (62) 
ue Onset 
+ JOP ols 2 K, {a(A2+y2)!}—p Ky fa(2+ 2)a}] 
AGU ieeigr aes A tag 3 {lo AP +7i) 3-H Ko % 
a (Veo 2, Ky {a(?2+74)*} —yieoK ota( (A2+y5)*}}. « - (58) 


ViKO —YoHy 


286 B. L. Coleman on the Propagation of Electromagnetic Disturbances 


Now as a>0, K,[a(A?+y1)!]}>log a, and effecting this replacement it 
follows from (44) that 


jap. YG eT 
Ly TPo 
so that 
Votes —ite) 3 \ we 
I(z)->1(0) exp< — (>) |a|>. - . - (55) 
(o)>1(0)exp{ — (Mar Ie | 


(see remarks following equation (47)), which is the required result as 
set out in §1(2). 


§5. Extension TO MULTILAYER MEDIUM. 

The terms in the integrals (20) and (21) are of two kinds, those involving 

an exponential factor and those not. The former give rise in subsequent 
; integrals to terms which remain finite as a+0, whilst terms derived from 
the latter tend to infinity. In order to extend the theory to multilayer 
media it is necessary only to evaluate the new P and Q functions. The 
method given in §2 is readily extended to this case, and it is seen that 
the terms in the new expressions for P and Q which represent the effect 
of distant layers all contain exponential factors, so that following through 
the analysis given in §§ 3 and 4 the necessary generalization as set out 
in §1 is established. 


§6. CONCLUSIONS. 

Summarizing the preceding results it is found that for a thin wire there 
exists an exponential mode of propagation the propagation constant of 
which is that of the medium in which it is buried ; whilst if it lies in an 
interface its propagation constant is closely related to those of the two 
abutting media. In either case distant layers have no effect. This mode 
is termed the principal mode. 

For a wire of finite radius there exists a mode of propagation closely 
approximating to the principal mode. The change in propagation 
constant from the interface value to the totally buried value may be 
expected to be quite sharp, the transition taking full effect at depths of 
burial of a few diameters, the argument being suitably modified when 


other layers are present, but in any case distant layers have no effect 
on the mode of propagation. 


BIBLIOGRAPHY. 

Admiralty Computing Service, Dictionary of Laplace Transforms, Pt. 2B 
Section 2.3. aut bias 

CampBELL, G. A., and Fosrsr, R. M., Bell Telephone System Monograph B584, 
Fourier Integrals for Practical Applications. 

McLacu.an, N. W., 1941, Bessel Functions for Engineers (Oxford : University 
Press). as’, J 

Stratton, J. A., 1941, Hlectromagnetic Theory (New York: McGraw-Hill 
& Co.). 

Sunpb, KE. D., 1936, Electrical Engineering, 1338; 1940, Electrical Engineering 
Trans. Sup., 987; 1945, Bell System Tech. J., 253. 

Wuirraker, EK. T., and Watson, G. N., 1940, Modern Analysis, (Cambridge : 
Press). Section 6.22 et seq. i 


b] 


along a Thin Wire in a Horizontally Stratified Medium 287. 
APPENDIX. 


ON THE INTERCHANGE IN THE ORDER OF INTEGRATION 
IN CERTAIN REPEATED INTEGRALS. 


(a) In equations (28) and (38) repeated infinite integrals were trans- 
formed by interchanging the order of integration. This is always 
permissible if one pair of limits is finite, and the integrand is continuous 
in both variables within the limits of integration, but where both limits 
are infinite the argument breaks down, and it is only in very restricted 
circumstances that general theorems are available. 

In the case of equation (38), where the integrand is a known function, 
a laborious proof of the legitimacy of the interchange can be constructed, 
and is outlined in (b) below. The justification of (28) must rest on less 
rigid arguments of a physical nature. If the integrands are functions 
which represent physically possible distributions of potential, and if the 
final answer must be that of a real physical problem, an answer obtained 
by interchanging the order of integration is likely to be correct if there 
is no reasonable alternative solution. 

This presupposes some test of the validity of the solution adopted. 
An example is the equation 


| * Jq(at) cos (bt) dt=(a2—b2)# for b <a 
0 


— oe) 29 b=a 
= 9 eM aT 


an equation similar to those in the text. At b=a two possibilities exist. 
If 6 approaches a from below, the limit value of infinity is obtained 
If 6 approaches a from above the continuation value of the integral by 
a summation process is zero. The selection of the value at b=a would 
depend on physical arguments, assuming that the final result must agree 
with other available evidence. Thus if the quantity could not be infinite 
then the representation is inexact and a special study of the physical 
conditions around b=a would have to be made. 

In the present instance the results obtained agree satisfactorily with 
the physical requirements of the problem and with the available experi- 
mental evidence, so that the procedure adopted is considered valid. 


(b) In equation (38) the transformation 


ii cos Av dv ie F(u?)uJ o[u(v?+-a*)*?] du 
0 


0 


=: I F(u?)u dw | cos AvJd o[u(v?--a)t]dv . . . (56): 
0 0 


is used, in which F(u?) is any one of certain functions as described in 
the text. The inner integral on the L.HLS. is the electric intensity of a 
Hertzian dipole, so that from energy considerations it must be L2 (0, co}, 
and so its Fourier transform exists. The R.H.S. has been shown to. 


288 On the Electromagnetic Disturbances along a Thin Wire 


exist in the text. Writing unity for F(u?)u in (56) the truth of the modified 
equation is demonstrable by direct evaluation of the integrals concerned, 
‘so that is sufficient to prove (56) for F(u?)w—C where C is any constant. 
‘This must be done for the real and imaginary parts of F(u?), which can 
‘be any one of the following functions: 


(w2+-y6)~# 
pis(02-198)!—olu-+ 92)! exp {—=2h(w2-+y78)}} Tey 
py (U2 +-y2)*+-po(u2 +7)?” (u?-+-y6)# ant 


(u?-+-y@)* exp {—2h(u2+-5)*} eR Sis 
[py (Ww? +2) pg (u? +?) Lyi (we? +2) # + y2g(u2+ yd)t] 


Introducing the function 
Re 
H(u)= lm {u[F(u?)u—C]} 


‘where in each case C is lim F(u?)u, it can be shown that uw, exists such 
u—> 0 


that for w>u») H(w) is a monotonic function increasing or decreasing to 
‘zero, and since the sign of H(z) is immaterial the second case is assumed. 
Now by using the asymptotic expansion of the Bessel function it can 
be shown that 
(és ee ia cos AvJ of u(v2-+-a?)5] 7, 
Z a u 
us bounded for 7,>2%, y¥; >Yp Where WO >%,>%,, OLY. >Y. 
It follows by applying the second mean value theorem that 
{ H(w) 


= a is cos AvJ of u(v?+-a?)?] du| <e 
Y 


vy 


for all a >a, >2x(e), Y2>y,>y(e) and the required result follows without 
further trouble. 


ACKNOWLEDGMENTS. 


The author wishes to express his thanks to Dr. S. Whitehead, Director 
‘of the British Electrical and Allied Industries Research Association, 
‘for permission to publish this paper, to Mr. R. H. Golde for his interest 
cand guidance, and to Mr. 8. Michaelson for assistance with the Appendix. 


, 


XXIV. Quantal Aspects of Scientific Information. 


By D. M. MacKay, 
Wheatstone Laboratory, King’s College, London*. 


[Received December 15, 1949.] 


SUMMARY. 


This paper relates to the borderline linking experimental and theoretical 
physics with mathematical logic, and covers at several points ground 
which is common to the theory. of communication. Its purpose is, 
therefore, to introduce in language as far as possible common to workers 
in all these fields, a quantitative approach to the analysis of scientific 
concepts and the design of experiments. 

Many scientific concepts in different fields have a logically equivalent 
structure. One can abstract from them a logical form which is quite 
general, and takes on different particular meanings according to the 
context. Identification of this structure in precise terms leads to a 
clarification both of experimental principles and of fundamental relations 
in different fields of physics. 

It is suggested that the most fundamental abstract scientific concept 
is quantal in its communicable aspects. It is defined as ‘‘ Information- 
content ’’ and has two features: the a priori or structural, and the 
a posteriori or quantitative. To each there corresponds a quantum of 
information, representing the minimal elementary proposition relating 
respectively to the structural and quantitative aspects of scientific 
statements. 

Scientific statements are regarded as complexes of these elementary 
propositions. The information-content of a result can be completely 
represented by means of a vector in a multidimensional space, the 
dimensionality of the space and the square of the length of the vector 
indicating respectively the amounts of structural and quantitative 
information provided, while the orientation of the vector specifies the 
result. The information-content is shown to be fundamentally limited 
by the number of conceptual units of space-time devoted to the experiment, 
with obvious practical implications. Expressions are also derived 
measuring the amount of detail possible in a result under different 
conditions. 

From the standpoint here adopted the various uncertainty-relations 
of physics appear basically as axioms expressing the quantal nature of 
communicable information, consequent on the use of logical forms ; and 
the quantity entropy plus information-content appears as a fundamental 


* Communicated by Professor J. T. Randall, F.R.S. 
SEB. 7, VOL. 41, NO. 314.—MARCH 1950 Y 


290 D. M. Mackay on the 


invariant of a physical system. The present paper is, however, intended 
mainly to be explanatory and introductory, and consideration of more: 
fundamental implications is postponed. 

A glossary of less-familiar or new terms introduced is appended. 


I. The Formalism of Information Theory. 


§ 1. INTRODUCTION. 


WHEN a physicist ventures into the domain of epistemology hen is perhaps. 
especially vulnerable to the charge of arbitrariness in his treatment and 
choice of definitions. This difficulty is particularly acute in connection 
with a theory of scientific information, for the brimstone flavour of. 
metaphysics hangs over the subject, and adds zest to criticism. The 
present paper is an introductory account of an approach which seeks 
to reduce arbitrariness to a minimum by an appeal to logical first principles. 

Experimentation abounds with indications that the everyday concepts 
of science are not the most fundamental. Each time that a compromise 
has to be struck, say, between the sensitivity and the response-time of 
a galvanometer, or the noise-level and band-width of an amplifier, or 
the resolving power and aperture of a microscope, one has an intuitive 
feeling that in each case some quantity is remaining constant behind all 
experimental manipulations—something more fundamental than either 
of the quantities in question. We say that ‘‘ Nature cannot be cheated ”’ ; 
and examples of this principle recur throughout the realm of measurement, 
and not only in microphysics. 

Is there not then a way of expressing scientific facts so that in any 
context a single universal principle can apply? Presumably in 
sufficiently fundamental terms such a principle should become obvious.. 
This thought at least was the stimulus to the present investigation, 
which leads to the definition of an abstract concept common to all 
scientific statements, and responsible for their logical significance. In. 
terms of this quantitative concept of /nformation, various semi-intuitive 
principles can be seen to have a precisely definable basis. in a general 
axiom. 

The establishment of a basic formalism for this purpose will occupy 
the first part of the paper. The second will deal in more detail with 
matters raised by the first, and will outline some of the. practical impli- 
cations of the theory. It is emphasized, however, that any examples 
given are intended mainly to help readers with different backgrounds to: 
understand the context, and are not meant to anticipate the more 
thorough treatment of particular applications which it is hoped to give 
in later papers. In particular references to communication theory are 
designed mainly to establish the connection between the author’s approach 
and that which has.independently been so elegantly developed by various 
workers in that field (e.g. Gabor 1946, Shannon 1948, Wiener 1948). 


Quantal Aspects of Scientific Information 291 


§ 2. ACQUISITION OF ScrENTIFIC INFORMATION. 

It is necessary, with some apology, to begin with a brief analysis of the 
nature of the scientific method and scientific propositions, in order to: 
introduce our terms and arrive at the minimal form in which scientific 
facts can be expressed without loss of significance. 

A scientific statement may be defined as a precise description of certain 
events pictured as populating a tract of a coordinate-space. Its. 
essential purpose is the communication of information derived from an 
expertment—an activity in which events are classified. The acquisition 
of scientific information thus involves two distinct problems, each of 
which has given rise to a somewhat loose appropriation of the term in a 
technical sense. 

Firstly, one must devise apparatus and/or prepare some system of 
classification, such that an adequate number of independent categories 
can be defined when describing the result. For example, if fluctuations: 
varying in frequency between | and 100 per second are to be observed, 
the apparatus must be capable of responding in a time of the order of 
1/100 second, %.e. of giving 100 “ independent ”’ readings per second. 
Or again, if ten shades of colour are recognized as distinct modes of 
description (7. e. coordinate-values on a ‘ colour-axis ’’) of the members 
of a population requiring to be classified, then ten columns must be 
provided in the observer’s notebook. 

When a chain of apparatus is involved (including the arenas then 
the differentiating capacity of the least-discriminating link determines 
the number of independent categories in the result. 

There is a sense in which this number, that is to say the number of 
independent dimensions or ‘“‘ degrees of freedom ’”’, can be regarded as a 
measure of the information supplied by the experiment. (Its analogue 
in the case of communication-theory has been so defined by Gabor (1946)).. 
It is, however, inadequate to represent completely the information 
derived from an experiment, as we shall see. Giving a more general 
connotation to a term coined by Gabor, we shall define it in due course: 
as the logon-content of a result. 

Secondly, the experiment must be performed, by using our apparatus: 
(be it galvanometer, microscope, or eye-plus-notebook) to classify events: 
in the chosen tract of coordinate-space. For instance we may record 100: 
independent values of a quantity as a function of time—the amount of 
that quantity which is associated with each of the 100 identifying-points: 
provided by our method in the time-tract considered. We may divide 

a sample of a population into ten colour-groups, determining the fraction 
associated with each of the ten identifying points on the colour-axis. 

Performance of the experiment thus results in the association of a 
number with each of the ‘ labels ’—categories or degrees of freedom— 
defined by the structure of the experimental method. The usual 
scientific statement, however, takes the form of an inference having a 
certain probability deduced from the experimental result. Thus we arrive 


Y2 


292 D. M. Mackay on the 


at a second common use of the term ixformation, to signify the source of 
confidence in a given number as representative of the class identified by 
its label. 

As we shall see, it is possible to give a precise numerical significance 
to this concept also. It is the essential a posteriori complement to the 
a priori logon content, for the complete representation of the information 
derived from an experiment. 

So much for the context in which what follows is relevant. We must 
now begin at the other end, so to speak, by analysing the nature of the 
scientific statements made to convey the results of an experiment. In 
this analysis we shall find a basis, free of arbitrariness, for a quantal 
representation of scientific information. 


§ 3. THE QUANTIZATION OF INFORMATION. 


(a) Hlementary propositions. 

A scientific statement is a logical form based on limited data. As such 
is can be dissected ultimately into a pattern of elementary ‘‘ atomic ”’ 
propositions (Sheffer 1913, Wittgenstein 1922). Each atomic proposition 
siates a fact so simple that it cannot be further decomposed, and hence 
has only the characteristic ‘‘ true” or “‘ false”’; its existence is its only 
attribute. Each therefore owes its individuality only to its position in the 
logical pattern and may be regarded as a kind of link between two entities 
in the pattern. Ultimately the latter must make contact with the 
primitive sense-data in terms of which it acquires meaning, but this 
aspect of the subject does’ not concern us here. We shall treat as 
“‘elementary ”’ the simplest propositions relating to the concepts of 
measurement and classification. Any debt so incurred to psychology 
is better settled out of court. 

Ideally, a scientific statement is based entirely on observable evidence, 
and the ideal statement which would describe all the information supplied 
by a particular experiment is presumably reducible to a pattern of 
independent elementary propositions relating to observations. We 
should expect therefore to be able to define a measure of information- 
content corresponding to the number of such propositions substantiated 
by a given experiment. On this view we are led to define one kind of 
unit of information as that which decides us to add one elementary proposition 
to the ‘‘ frame ” (the pattern of propositions) which is logically sufficient 
to define the results ‘observed. 


(b) Quantization of Information. 


The element of decision here appears to be fundamental, for it leads 
to a quite irreducible quantization of the action of scientific information. 
This is forced on us by our use of logical forms—by our apparent 
incapability of communicating information unambiguously in any other 
way. Our statements are rigorous; to speak of an ‘imperceptible 
change ” in a statement would be meaningless; the minimum change 


Quantal Aspects of Scientific Information 293 


possible is the addition of one element to the logical pattern ; and a unit 
of information is simply that which causes us to make one such addition. 

Assuming then that the formation of scientific statements is a quantal 
process, the next task is to identify our quanta in terms of the processes 
of experimentation studied in §2. It is here that care is necessary to 
avoid confusing the two common uses of the term information. Prior 
information is presented by knowledge of the experimental procedure ; 
posterior information arises as the latter is carried out. The one defines 
the structure of the ultimate statement ; the other defines the amount 
of evidence which it subsumes. Each is quantal in its communicable 
aspects, though, of course, the respective ‘‘ quanta ”’ differ qualitatively. 
It will best serve our purpose to consider first posterior information. 


§ 4. PosTERIOR OR METRICAL INFORMATION. 
(a) Scale-Units. 


The quantal character of posterior information arises from the way 
in which we describe a scientific measurement. A description of a 
result is basically a set of instructions enabling the reader to reproduce 
for himself a conceptual pattern representing the experience of the 
observer. The scientific discipline recognizes only one type of elementary 
experience as necessary and sufficient for this purpose; the most 
elementary observational proposition asserts the existence of a coincidence- 
relation between two entities. On the other hand we define a magnitude. 
by saying that it occupies a certain interval on a scale. Logically this. 
occupance-relation between scale-interval and magnitude is a consequence 
of the existence of coincidence-relations between the ends of the ‘‘ unknown” 
and two definable graduation-entities on the scale. 

Now the fineness of graduation of a scale, whether physically marked or 
estimated, is always limited for reasons which will shortly be examined. 
The point at present is that for every observation there is a minimum 
separation between neighbouring graduation-entities, B,,_,, B,,, B,,.1, say, 
below which either we cannot define or cannot substantiate with proba- 
bility greater than one-half, a proposition of the form: “A falls into 
B,,_:—B,, and not into B,—B,,,,”’. This smallest interval we shall call 
the scale-unit appropriate to the observation. 


(b) Metron-content. 


With a scale marked in such “ minimum meaningful intervals”, a 
magnitude can be specified by the number of intervals which it occupies. 
This number provides an index of information which we can describe 
as metrical, relating to measurement. The unit of metrical information, 
which we shall term a metron, is that which enables one elementary 
interval to be represented as occupied, in the logical pattern to be 
communicated. In other words each metron specifies one elementary 
occupance-relation. 


294 D. M. Mackay on the 


Thus what we carry away from a measurement is basically an integer, 
the number of conceptually separate occupance-relations which have 
been specified. This integer we shall refer to as the metron-content of the 
result. 

There are certain requirements which must be satisfied if the metron- 
content is to be equated to the number of atomic propositions justified 
by a given experimental sequence. It must be so defined that the 
metron-content of two similar but independent sequences is twice that 
of either alone. The metron-content of a result must be incapable of 
augmentation by purely logical manipulation, and all complete represen- 
tations of a given result (e.g. in terms of a frequency-spectrum, or a 
time-function) should have the same metron-content. 

If these conditions are satisfied, a result yielding a given metron-content 
provides a fixed number of logical elements out of which dependent 
or equivalent statements can be constructed. We can regard the making 
of these statements as a rearranging of the elements of the logical pattern. 
A suitable formalism to represent this process will be considered in § 6. 


(c) Proper scales. 


It is obvious that not all physical quantities can be represented on 
scales having a constant scale-unit. For example the square root of 
a measured quantity has presumably the same metron-content as the 
quantity itself, since the latter can be derived from it by a purely logical 
process ; but the metron-content cannot be proportional to both quantities. 
Conceptual scales, which we may call proper scales, are not necessarily 
linear in terms of physical magnitude. 

The proper scale of a quantity subject to random fluctuations is a 
good example of this fact. Fisher (1935) justifies his definition of 
statistical ‘information ’” (which is analogous to our metron-content) 
- as a quantity proportional to the reciprocal of variance, by noting that 
variance depends inversely on the number of samples involved, and is a 
measure of the uncertainty with which a given sample can be regarded 
as representative. If the number of samples is multiplied by n, the 
information provided (assuming independence) should be m times as great. 
Since the same process reduces the variance by l/n, the information 
provided should therefore be proportional to its reciprocal. 

The metron-content ¢ of a single measurement of a randomly fluctuating 
quantity « having a probable error +42 is therefore not 2/42 but 
i=x"/(Ax)*, and the physical scale of x is divided into significant intervals 
which are non-uniform *. (The scale of x?, however, is uniformly divided.) 
This means that we can give a probability p of just } to a statement in 
the form: x (or x) occupies 7 intervals. The figure of 4 implies, not 


*On the abstract conceptual scale of course only order and number are 
significant, so that all intervals are equivalent. It would be meaningless to 
say they were equal. 


Quantal Aspects of Scientific Information 295 


that we are ignorant of the magnitude of x, but that its terminal is as 
likely to lie within the ith interval as to fall outside it. It also implies 
that it would be unprofitable to employ a narrower interval in making 
our statement, for we should then be bound to make it in a form with 
p<+4: more likely to be false than true. 


(d) Observation and measurement. 


In more complex cases there is a clearer distinction between observation 
and measurement. For instance we observe the position of a pointer, 
and hence measure a voltage. Now the precision of observation certainly 
sets an upper limit to that of measurement. Basically we observe the 
distance between the pointer and some reference mark (which must be 
a point at which another reading has been taken under standard conditions. 
It is possible to deduce the metron-content of the figure we obtain for this 
distance, and we could call it the metron-content of the observation, but 
it is not of fundamental significance. In principle it is the reproducibility 
of the pattern of events symbolized by the statement made which governs 
the metron-content of a measurement. The greater the metron-content 
the more detailed a statement can be made with a 50-50 chance of 
non-contradiction. Thus fluctuations in the readings may be the 
limiting factor. Indeed one may say that in a well-designed experiment 
this is always the case (cf. the operation of radio search receivers at the 
<< Jimit of noise’). An experiment is not giving full information unless 
the metron-content of the observation exceeds that of the measurement. 


(e) Displacement of origin. 


An important point has not so far been raised. A measured value of 
a quantity 2 is inevitably represented as occupying an interval Az. 
Accordingly since metron-content is essentially positive, the smallest . 
“observable value’ which we can justifiably attribute to a quantity 
linearly related to metron-content is not zero, but $4v. The first interval 
which it can occupy on our conceptual scale has a width 4x. A detaiied 
consideration of this point, and of the relation between different information 
scales, will, however, require a separate paper. 


(f) Summary. 


The suggestion here made may now be summarized as follows: full 
justice is done to the accuracy (the logical content) of any single measure- 
ment when it is described in terms of its scale-wnit and its metron-content. 
The latter may be represented as the number of intervals, on an abstract 
conceptual ‘ proper-scale ’, occupied by the measured quantity ; but 
the number of intervals on this scale will not necessarily be proportional 
to the measured quantity, since the scale-unit may also depend on the 
latter. Quantities for which such a proportionality does hold, have a 
special claim to be regarded as fundamental in the particular context. 


296 D. M. Mackay on the 


§ 5. SrRuCTURAL INFORMATION. 


(a) Logon-content. 


It was Kant who said: “reason has insight only into that which it 
produces after a plan of its own.” The design of an experiment is essen- 
tially the specification a priori of a pattern, of categories in terms of 
which alone the result can be described. All the events of the experi- 
ment must find a place in one or other of these, though of course not all 
categories will necessarily find an exemplar in a given experiment. 

Since each independent category enables us to introduce a measure of 
differentiation—i. e. of form or structure—into our account of a result, 
we can regard knowledge thereof as providing us with prior or structural 
information. We can therefore define a unit of structural information or 
(using Gabor’s term (loc. cit.)) a logon, as that which enables us to formulate 
one independent proposition, describing one independent feature of the 
result. (Whether when we have formulated the proposition we shall find 
any metrical information to give it logical content, is another matter.) 
The amount of structural information in a result, the logon-content, is 
thus the number of independent categories or degrees of freedom precisely 
definable in its description. 


(b) Logon-capacity. 


In many cases structure is defined in terms of a reference-coordinate.. 
For example the density-pattern on a photographic plate can be described 
by a function of one or more space-coordinates ; and the structure of a 
telephony signal can be specified by a time function. The logon-capacity 
of an experimental method can in such cases be defined as the number of 
logons which it specifies per unit of coordinate-interval, or coordinate-space 
if several coordinates are involved. The total number of independent 
categories or features in the result is then the integral of the logon- 
capacity over the extent of coordinate-tract occupied. 

Thus the logon-capacity of a microscope in a particular region in the 
focal plane can be defined in logons/cm.”, and measures the resolving-power 
in that region; for suitable test-objects a resolving power in a given 
direction can also be defined, in logons/em. The logon-capacity of a 
galvanometer or a communication-channel is measured in logons per 
second, and represents the number of (practically) independent readings 
per second which can be made with the apparatus. 


(c) Frequency-response. 


It is useful here to introduce the idea of frequency-response in a some- 
what generalized form. With instruments measuring functions of time 
it is common practice to define performance in terms of their frequency- 
response to sinusoidal inputs of different time-periodicities. In the same 
way one can define Fourier-variables associated with other coordinates, 
in order to assess the performance of other types of instrument. 


Quantal Aspects of Scientific Information 297 


For example one could use sinusoidal density-patterns of different 
space-periodicities to define the ‘ frequency-response ” of a microscope, 
if one may so extend the use of the term. In each case a precise meaning 
can be given to the frequency-bandwidth of an instrument, as the effective 
range of input-frequencies (i.¢ of the Fourier-variable) to which. it is. 
sensitive. (For example in a microscope this is a space-frequency range 
normally extending from zero to a value directly proportional to the 
aperture used. It thus measures the fineness of detail perceptible, in 
terms of a Fourier-type of analysis of the transparency-function which 
represents the object. One need hardly say that ‘“ frequency ”’ here is 
quite distinct from the frequency of the light used.) It should be noted 
that with more than one coordinate the Fourier-variables must be 
represented in a Fourier-space in which ‘“ bandwidth ”’ is defined by a 
** volume ” not necessarily rectangular.” 


(d) Bandwidth and logon-capacity. 


The bandwidth of an instrument in the above sense is directly related. 
to its logon-capacity. The relation arises from the well-known uncertainty- 
principle which can be written 


Ais ABEAR > oy PE ES een) Gamera ed 


where 4f represents the effective range of frequencies (conjugate to a 
coordinate q) to which the apparatus is sensitive, 4q twice the 
“uncertainty + in g, and K, a number which we may take to have the 
value 3. 

Equation (2) will later be justified from our present point of view.. 
Meanwhile we may note that points on the q-axis cannot be defined 

uniquely at closer intervals than K,/4f, so that the logon-capacity is. 
' Af/K, or 24f. To attempt to talk of “an interval smaller than Aq” 
would be to try to construct a logical pattern identical with that of “a. 
frequency higher than 4f’’ which cannot by definition appear in any 
result and is therefore observationally meaningless. The logon-content 
1 of an experiment involving a tract of extent ¢ is thus 


Pager ine oe ap ee eS (3) 


§ 6. REPRESENTATION OF INFORMATION. 


(a) The Information Matrix. 

The relation between the two features of information becomes clear: 
in terms of a simple formalism. The definition of information with which. 
we began was operational ; “information ”’ acquires meaning in terms. 
of what it does. Thus to represent the total information, metrical and 


structural, provided by an experiment, we require a concise representation 
oP) 57s Oa Se en he ee Se Se 
* The author is indebted to a correspondent for raising this point.. 


+ i.e. Mq is the effective range of q. 


298 D. M. Mackay on the 


enabling us to deduce its effect on all conceivable statements relating 
to the result—in mathematical jargon, an information-operator. Suppose 
that an experiment has yielded J logons. To represent these as 
independent, we may think of them as defining a set of orthogonal rays 
in an /-dimensional ‘“‘ information-space ’’, with corresponding unit-vectors 
44%, ...%. Taking these unit-vectors as a basis, we can now represent 
the total information yielded by an experiment by an information-matrix 
in the ‘‘ canonical ” form : 


-t 
where i,...%, are the metron-contents of the logons represented by 
%,... a respectively. 
Now relative to the logon-basis «,, any proposed dependent statement 
can be represented by a unit vector-function ¢, say, defined by 


l 
p= X'f,.«%,, where the f, are direction-cosines of ¢ (or its corresponding 
r=1 


ray) and measure the relevance of the corresponding logons to the state- 
ment. If now for example we wish to calculate the total metron-content 
7, which the proposed statement would have, we need only form the 
product 


tg hio Ah pear ates hd eee 
where ¢’ is the transpose of ¢. 

This procedure is common in the statistical estimation of significance, 
and has its analogue in the calculation of expectation values in quantum 
mechanics. In fact all the necessary mathematics at this point is at hand 
in textbooks on quantum theory (see e. g. Tolman 1938). 

Without at present going into detail, we may note briefly three further 


l 
points: (a) the trace of I, i= Yi,, represents the total metron-content 
ra 
of the original result ; (b) a logically equivalent re-statement of the result 
corresponds to a unitary transformation of the basis, under which I will 
in general lose its simple diagonal form. This means that each 
“component ” of a dependent statement is no longer a function of just 
one of the original logons. But (c) the trace of the information matrix 
is invariant under unitary transformations, so that—as our axiom 
demands—the total metron-content is unaltered by logical reformulation. 


(6) Geometrical Representation. 

As in the analogous formalism of quantum theory, this procedure can 
be given a simple geometrical representation (see e.g. Rojansky 1946). 
Because of its diagonal form, the operator I can be defined by a 
vector with components 4/(i,), \/(is), ... (i), which we can call 
the information-vector. This has a length ,/(i), and defines a point, 


Quantal Aspects of Scientific Information 299 


‘or rather a volume-element, in the information space, relative to the 
origin. The information-vector, or the element whose occupance it 
specifies, can be regarded as the basic invariant behind the various state- 
ments we may make about the experiment; it is the “real result ”’. 
‘The components of the vector along each axis are the square roots of the 
metron-contents of the corresponding logons, in accordance with our 
conservation-axiom, §2(b). Any dependent statement is now defined by a 
direction in the space, and its metron-content is the square of the projection 
of \/(¢)in that direction. Unitary transformations correspond to rotations 
‘of the axes, and the significance of points (a) to (c) above is easily seen. 

We have seen that the dimensional multiplicity / is fixed by the choice 
of experimental method (including the volume of space-time occupied). 
Performance of the experiment results fundamentally in the collection, 
and allocation to the various logons, of the metron-flow arising fromthe 
impact of data on the apparatus plus observer. 

The quantal character of this process has one important effect. Since 
the number of ways in which 7 metrons can be distributed among / logons 
is limited, only a certain number of different results can conceivably be 
given by a particular experiment. An experiment, indeed, is an attempt 
to choose between a finite number of possibilities. If it were not so 
experimentation would be impossible. 

The merit of an experimental method can therefore be gauged, in part 
at least, by the speed and ease with which it enables one to identify the 
-appropriate information point. . 

We shall return to the question of the detail in a result in Part IT., (§ 9). 


1 Practical Details. 


In developing the formalism of Part I. we incurred a number of debts 
and shelved several problems. These we must seek, now to remove in 
a more detailed discussion. 


§ 7. MeTricaL ScAaLE UNITS. 


The first omission concerned the nature of the limitations on the 
accuracy of statements. In either metrical or structural propositions, 
a numerical magnitude must be quantized in order to be logically 
representable. The origin of the quantization in the two cases is quite 
-different however, as we shall now show. 


(a) The metrical information-axiom. 

A single metrical statement about a quantity, y say, is logically sterile 
unless we identify it by a coordinate-label, q say. The proposition : 
«« T have received i metrons relating to y ’ represents our actual experience, 
and has no vagueness; but vagueness arises directly we conceive y to 
‘be a function of g, and try to formulate a proposition about y in terms 
-of ¢. 


300 D. M. Mackay on the 


For example we may say precisely that a certain measurement of 
power W over a given period is worth 100 metrons. But when we try 
to give structure to our result—to say how the power varied during that 
period—we are quite unable to assign finite metron-content to an estimate 
of the power at any given instant. The most we can do is to speak in 
terms of a small interval in which at least one metron was gained. The 
reason is simply that the information takes time to accumulate : there is 
only a finite metron-density on the time axis. (We may remark that there 
is an equally strong case for saying that time requires information to make 
it real, and that our sense of time is an abstraction from the flux of 
information we receive. This is outside our present subject, however.) 

We have now a single logical pattern serving two modes of description. 
The result is an uncertainty-relation (really an information-axiom) which 
arises in general as follows : 

Metrons, the “atoms” of the information we have received, 
are “‘ scale-free ’’. That is to say we can construct a number of dependent 
propositions involving the same number of metrons, and can regard them 
as logically equivalent statements of the result, as long as we fix appropriate 
definitions of the scale-unit in each case. Thus to translate i metrons. 
into a statement of the magnitude of y we must have a scale-unit Ay 
such that 

yaad Ay hia TA ee ee 


If at the same time we wish to say how y varied as a function of q (7. e. to 
spread our information along a q-axis, such as the time-axis) we must do 
so by defining a metron-density, say p; metrons per unit of g; so that if 
the information 7 was gained over a tract J4q, 


: q+ hq 
i= a @) dg. sion a) ok conn 
q 


If for p,q) we substitute its average value in the interval 4g, and call 
it pg, we have from (6): p,=7. (1/4q), which we may write 


Parin Avo veces te 1 eae 


Now the definition (7) of p, is logically equivalent to the definition (5). 
of y. Both are anchored to the same i metrons. To connect the two 
statements we define the relation between the corresponding scale-units 
by means of a conversion-factor K,, such that 4y=K,,. 4p. Thus we 
have the generalized uncertainty-relation or information-axiom 


Ay Agee atic ts Ouse real eee tS) 
which shows that the limit to the accuracy of measurement of y is directly 
proportional to the extent of g devoted to the process. 

(b) Natural-units. 


There is however another way of regarding this. We can also write 
(from (5) and (8)) 
Agertic (Ka) ae vast Stet, fee oe ee Oe 


Quantal Aspects of Scientific Information 301 


‘This is again in the form of (5), and K,,/y can evidently be regarded as 
a natural unit of g, say ¢,, within which just one metron is acquired. 
In other words, i metrons acquired in an interval 4q enable us conceptually 
to subdivide it into 7 intervals of magnitude e,, Conversely, we can say 
that the metron-content of a measurement cannot exceed the ratio of the 
coordinate tract Aq associated with it, to the appropriate natural-unit of q, Eq: 
€, can be calculated for any given experiment, and plays the logical 
role of an atom of space or time. In the most general case in which y is 
a function both of space and time, the relevant natural unit is the product 
of all the e, 8, and the total metron-content cannot exceed the number 
of these space-time units in the total volume of space-time occupied by the 
experiment. There is a sense in which «, is a “ scale-unit ” of g; but as 
the tract irreducibly associated with each measurement of y is 4g, Aq is 
the smallest interval which enters into any proposition involving y, and 
the symbol 4 is retained with that connotation. In the limit where the 
metron-content of each measurement is unity, «, equals dq. 


(c) Practical aspects. 

Space permits only a brief outline of the practical implications of 
this principle, which provides a ready “ criterion of satisfaction ”’ with 
the accuracy of a given method. It is particularly helpful in considering 
the statistical matching of one part of an experiment to another. For 
instance if a weak link in a sequence is known to yield only a certain 
metron-content %9, it is possible to estimate the time and/or space which 
it is worthwhile to devote to each of the, remaining links, and to gain 
in overall metron-content per unit of space-time, or metron-capacity, 
by deliberately designing these so as to barter accuracy for speed or 
compactness. It is wrong to say (e.g.) that it ‘‘ does no harm” to use 
a galvanometer which is unnecessarily sensitive ; for a less sensitive one 
could have a more rapid response, and allow several experiments each 
yielding 7 to take place in the same time. 


(d) Entropy and metron-content. 

_ The calculation of «, in particular cases sometimes yields a pleasant 
‘sense of having returned to familiar ground by a devious path. For 
example in measurement of a power W in an environment at a temperature 
T for which Boltzmann statistics are valid, K,, (equation 8) is the power 
per metron, sec.~!, or the energy per degree of freedom 3kT. «, the 
shortest significant interval of time (corresponding to the acquisition 
of one metron) is thus kT/2W. The entropy-change per metron is 
-evidently 


Nae Ve dui oe td (9) 


‘so that entropy and metron-content are equivalent quantities, both having 
quantal aspects, a change in one being opposite in sign to the change in 
the other. Without at present going into detail, we may note that in a 
physics which started from the concept of Information as one of its basic 


bole 


302 D. M. Mackay on the 


quantities, the sum Hntropy-plus-Information-content would rank as a 
fundamental invariant. (See §10 (c) below). From this viewpoint, 
temperature would appear as a measure of the scale on which energy 
is represented in the information-pattern. 

An analogous relation holds in the case of a space-statistic such as the 
density of a photographic image. 


§ 8. STRUCTURALLY-DEFINED SCALE-UNITS. 


(a) The structural Information-axiom. 

In structural propositions the absolute magnitude of y is irrevelant.. 
They are essentially definitions of propositional functions of which y is 
to be the argument. Accordingly the scale-unit of g can be defined only 
in terms of coincidence-relations independent of the magnitude of y. In 
§ 5 we used the concept of bandwidth (generalized) to define a scale-unit 
Aq which was a property of the apparatus used, ascertainable beforehand 
by an independent experiment, and hence counting as prior information 
on subsequent occasions. The general relation (2) which we used then, 
can be justified directly in terms of our initial axiom that only coincidence- 
relations or compounds thereof are valid as logical elements in scientific 
statements. 

Let us consider then the case of a simple harmonic function of g, with 
periodicity f. It will be associated with definitive points on the g-axis, 
independent of amplitude, whenever the function crosses the axis, 7. e. at 
intervals of half a period. (It is also arguable that ingenious experimen- 
tation should make it possible to define the quarter-period or the radian- 
period independently of amplitude.) 

In any case, the scale of ¢ is conceptually provided with a set of points 
at intervals of K,/f, where K, is of the order 1. 

These points, however, are all logically identical, representing collectively 
a single fact—the accurate value of f. What we want is a set of uniquely 
identifiable points, to serve as labels. To single out desired points we 
must provide a comparison-pattern to act as a “ pointer’. For instance ° 
a frequency f—Af will produce a pattern coinciding with the first at 
intervals of K,/4f. If all values of 4f from zero can be observed, a 
continuous range of intervals from infinity down to K,/4f can be 
observationally defined. The structural scale-wnit of q, Aq, is thus 
K,/4f. Conversely, if 4q is an arbitrary interval, the number J of logons 
relating to it which can be formulated can be written as 1< dq. Af/K,. 
since J is integral. Thus for a single logon dq. 4f>K, (equation (2). 
(It should be remembered that in practice the terms “ effective length ” 
and ‘‘ bandwidth ” often have a slightly different numerical connotation 
from that here employed.) 

It is of interest that the metron-density function specified by a single 
logon, for which equation (2) becomes an equality, is a Gaussian 
probability-function, since our data are exactly the assumptions made 


Quantal Aspects of Scientific Information 303. 


by Gauss in his derivation. (See Appendix.) Analysis of a function 
into logons is effectively description in terms of superimposed Gaussian. 
functions (Gabor, loc. cit.). 


(0) Resolving Power. 


To illustrate these ideas we may examine the problem of optical . 
resolving power. As our purpose is only expository we shall consider 
the somewhat unrealistic case of a narrow rectangular aperture and an 
ideal object which can be represented by a transparency-function of 
only one coordinate, « say, parallel to the long edge of the aperture. 
The situation can then be represented by the two-dimensional diagram 
of fig. 1. P is a point on the aperture, with coordinates (7, 6) relative: 


Fig. 1. 


ae Ax=A/2 sin? 


at tae 


Normal! incidence. 


to a point O on the object. Light is incident in the direction Oy. The- 
problem has two parts. Firstly, we shall establish a relation between 
the position of P and the information gained through it about the object : 
and secondly, we can calculate the logon-capacity for a given type of 
aperture. 
Suppose that light of wavelength A is received through P. This makes. 
it possible to distinguish logically points along the possible light paths 
to P at intervals of 4/2, so that with centre P, we can give logical 
significance to a series of ares of radii A/2, where 7 has integral values. 
These arcs will. intersect the x-axis and establish coincidence-relations 
with the incident wavefront at intervals of 4/2 sin @, in the vicinity of O.. 


B04 D. M. Mackay on the 


There is thus a correspondence between the point P and a space-function 
having a periodicity f,,=(sin @)/A in the object. Indeed if the aperture 
is closed everywhere except at P and the pole, N, the pattern seen has 
just this periodicity. 

This analysis has assumed that a plane wavefront was incident normally 
on the a-axis. If instead the light is incident in the zy plane at an 
angle ¢ to OX (fig. 2) the possible light-paths from P must be traced 
through the object to their origin on the incident wavefront ; and the 
logically describable paths will be those whose total lengths are integral 
multiples of A/2. The corresponding space-periodicity defined on the 


a-axis will then be f=(=)ocin é-+-sin ¢). 


Fig. 2. 


Incidence at angle ¢. 


To calculate the logon-capacity we need only observe that the 
“bandwidth ” is directly proportional to the range of sin@. Thus if 6 
can have all values between +0,,, 4f= (5) (sin 6,,), and the logon- 

capacity is (4/A)(sin @,,) logons per cm. 

This logon-capacity will not, however, be realized in the case of normal 
incidence, for then f,, is only (1/A) sin @,,, and only (2/A) sin @,, points are 
specified on the x-axis. Only by increasing ¢ to @,, can the full resolving- 
power be achieved. Increasing ¢ beyond this value, to give so-called 
‘ dark-ground illumination ”’, will increase f, but not 4f. The number 
of logons per cm. will not be increased, though owing to the removal of 
background light the visibility of detail is generally improved. 


Quantal Aspects of Scientific Information 305 


These results are well-known, but they have here been deduced by way 
of illustration from an axiom which can perhaps be generalized as follows : 
the limit to scientific observation is the limit of our logical vocabulary. If a 
phenomenon can be defined (in terms of the atomic propositions of the 
scientific method) it can in principle be observed. 


§ 9. ESTIMATION OF IMPORTANCE. 
(a) Variety. 


We can now calculate the number of logically definable information- 
points in the /-space of § 6. The number, which we may call M*, measures 
the total range or variety of the possible results, out of which the 
experiment has chosen one. It is therefore a measure of the importance 
of a result. For a given upper limit to i, M will depend on the way in 
which quantization occurs. Assuming that the apparatus is such that 
all integral values of each 7, can be distinguished, M is easily shown to be 


Ne eet ee FIO) 


if the total metron-content can have all integral values from 0 to 7. 
If, however, 7 must always have its maximum value, M is found to be 


Cece hon (eat alan Mire ew ctee mC (VT) 


(This is reasonable, since the possibility of varying 7 merely confers one 

further degree of freedom on the system as a whole.) 
We can also distinguish between cases in which (a) the apparatus sets 
no limit to the metron-density p,;, but only to the total 7, and (6) the 
, apparatus sets an upper limit to p;. In case (a), any 7, can have the full 
value 7, and formula (10) applies. The maximum of M is attained when 
l=i, so that the optimum metron-density in a statement is one metron 
per logon, as we have seen before. If ¢ is large, Stirling’s formula for ¢ ! 
then gives 


IM roe (77) \ eames es ety PSU tek e'7'9) 


In case (b), each 7, can equal but not exceed p,;. 4g. Interest now 
focuses on the value of M per unit of g, M,, which is given by the number 
of unit ‘“‘ volumes ” in an /-dimensional “‘ cube ” measuring (1+-p, . 4q) 
in each direction. (The addition of 1 allows for the value 7,—0). Since 
1=1/4¢q per unit of ¢g, this makes 


Nee (eon) eee ee er cgwec igi. (13) 
In the useful limit /=p;, so that 


DVS a, IR RR ee ioe eet tte 4) 


ns —- 


* The notation here has been chosen so as to coincide as far as possible with 
that of Shannon (1949). 


SER. 7, VOL. 41, NO. 314.—MARCH 1950 7, 


306 D. M. Mackay on the 


(b) Detail. 
' A convenient measure of the detail in a result is the equivalent number 


of “‘ yes-no relations ”’, 
N=Slogg Mis Nn cae eee 


For equation (12), in case (a), N has the value 
N=21—$ log, (72) 9.9 2, = oa) eee? 


in this case approximately twice the logon- or metron-content. The 
ability to group all metrons under any one logon approximately doubles 
the number of independent two-valued propositions which can result 
from the experiment. In case (6), equation (14), the density of detail 


C=Jog, Mi=Loor Afi Ks 5 es ns ee 
Note, however, that even increasing / to infinity only makes M,-e%, and 
Cp, log, e. «eles Seip We tk ae the ee 


In some cases (¢.g. where scale readings are proportional to voltage, 
in the presence of a constant noise voltage), only whole-number coordinates 
of the information point may be distinguished (and not merely whole- 
number combinations of metron-content). Mg then cannot exceed 


(Vp) (ef. (13) or My<(1+4)i2. aoe? 


This is the number of unit-spheres in an /-dimensional sphere of radius 


/(1+p,/1). When l=p,, 


M = 2"2))* and: “C= 1/230 ee 


(c) Increase of detail. 


These equations lead to one very practical point. To increase the 
amount of detail in a result, it is more profitable to increase / than 27, 
‘unless already />-7; but both methods can be used. 

In other words we have an incentive to apply ingenuity in such matters 
as the resolving-power of optical instruments, knowing that as long as 
the metrical information provided is not usefully employed, something 
practical remains to be done, even if classical theory appears to be 
discouraging. In the field of communication this aspect is already under 
vigorous development. 

The fundamental point is that as long as p; exceeds the logon-capacity 
I, there exists a metrical scale-unit smaller than the structural one. One 
may, therefore, legitimately aspire to formulate in some way or other 
more than / independent propositions about the structure in question. 


* This is analogous to Shannon’s measure of ‘ information’’, and the 
relations here derived may be compared with those in his paper on Communication 
in the presence of Noise (1949). 


Quantal Aspects of Scientific Information 307 


§ 10. PRactIcAL ASPECTS. 


Thinking in these terms proves helpful and often stimulating in assessing 
the shortcomings, merits and possibilities of different experimental 
methods. In particular, subjective preferences in matters of debate 
can often be usefully supplemented by quantitative considerations of the 
kind suggested here. 


(a) Precision. 


The precision of a single measurement can be enhanced indefinitely 
by increasing the space-time tract irreducibly* associated with it. Thus 
in experiments to determine a constant, efforts should be directed towards 
““ logon-compression ’’—reducing the frequency-response (7. ¢. the logon- 
capacity) of the apparatus, with respect to time and space. In short 
best results are obtained by acting consistently with one’s belief that the 
constant will not alter with time or position, so that one logon will be 
sufficient. In a sequence of operations, the logon-capacity of each should 
be adjusted so that the metron-content does not greatly exceed the value 
which it has in the stage with the narrowest bandwidth. This will enable 
each subsidiary operation to occupy the minimum space and time, so 
giving a higher overall metron-capacity, and making possible more 
repetitions of the experiment in a given space-time tract. 


(6) Structural detail. 


In experiments to determine structure on the other hand, the aim 
should be to reduce the relevant conceptual unit of coordinate to the 
minimum at which metron-content is adequate. One may even say 
(with reservations) that there is almost always an improvement in 
resolving power (with a given input of energy) when intelligent steps 
are taken to sacrifice metrical information. Any device which makes 
it possible to define a smaller interval can in principle be used to enhance 
the detail attainable. 


(c) Conservation of Information. 


Normally a compromise is required, and the formalism of § 6 is 
illuminating. Processes of logon-compression and expansion can be 
represented by rotations of the information-vector. In most processes 
some attenuation will occur, if they are not reversible in the thermo- 
dynamic sense. On the other hand if they are, we deduce from the 
invariance of entropy plus metron-content that a principle of conservation 
of information applies: It is impossible for a reversible system to destroy 
information, however complex may be the process required to reconstitute ut 
in its original form. At the same time no artifice of design or manipulation 
of a result can increase the accuracy of a measurement beyond the limit set 


by i (i. e. an increase in the metron-content of individual logons can be 
Rees a0 oan ee AE DN Wt LN WORE Ss 
* See § 7(0). 


308 D. M. Mackay on the 


bought at the expense of logon-capacity, but the limit is set by the total 
quantity i, which we have seen to depend on the expanse of coordinate- 
tract devoted to the experiment. 

This in effect is the general principle which lies behind the many 
“ incompatibilities ’’ encountered in experimental design. 


(d) Illustrations. 


A typical example of the benefits of logon-compression is the information 
provided by the analyses of insurance companies. They reduce erratic 
data obtained in enormous detail, to a small set of propositions of great 
reliability. An analogous process occurs when a large capacitor is used to 
smooth the output of a photoelectric cell. 

The converse process is well exemplified by the “ equalization ” of a 
telephony signal distorted by a narrow-bandwidth channel. Here the 
effective dimensionality of the information-vector is increased. The 
information-vector is rotated into some of the ‘“‘ dormant ”’ dimensions, 
which were occupied by larger components in the -original signal but 
had them attenuated in the channel. The consequent all-round reduction 
in information per logon appears in practice as a corresponding increase 
in noise. 

If, however, the channel is electrically non-dissipative (7. e. contains 
only reactive components), the original signal can theoretically be 
restored without loss of information. 

As a final illustration of the kind of approach here suggested, we may 
consider the type of aperture in a microscope which would give the 
maximum of contrast from a particular object. ‘‘ Contrast ”’ is essentially 
a matter of metron-content, so that the question may be put: ‘ What 
transparency-function in the aperture will yield the maximum metron- 
content ?”’? We may for illustration make the gross simplification of 
assuming that contrast is impaired only by random background-light 
uniformly distributed over the aperture. Again considering the linear 
case for simplicity, we can describe the object by its ‘“‘space-spectrum ”’ S(f,,), 
which will give rise to a corresponding intensity function S(€) over the 
aperture, which extends from €,, to —€,,. _ Lf the transparency-function 
is ¢(€), the total random background power received W, is proportional 


§ § 
to te t(€) d&, while the total coherent power W,, is | ” S(é)t() dé. 


=< Em 


a 


The ratio w. will be a maximum when ¢(é)=S(£) apart from a constant 
B 


factor. Thus the aperture should be masked by a transparency representing 
the spectrum of the expected image. If the requirement is for the 
maximum resolving-power and metron-content, it is known (see Appendix) 
that the spectrum corresponding to one logon is Gaussian in outline. 
Thus the best compromise will be obtained for general purposes by masking 
the aperture with a Gaussian transparency-function. 


Quantal Aspects of Scientific Information 309 


§ 11. Concnuston. 


The chief aim of the present paper has been to present a new way of 
thinking of experimentation, an effort to isolate the abstract concept ' 
which represents the real currency of scientific intercourse, from the 
various contexts in which it appears. Normal modes of expression of 
scientific facts involve logical redundancy in vocabulary. Logical 
analysis in terms of irreducible elements leads to a minimum vocabulary, 
in which a fact is represented by a conjunction of its abstract information- 
content with the particular context which gives it concrete meaning. 

Towards this minimum vocabulary the present paper can claim to 
make only a preliminary and naive contribution. It has been seen, 
_ however, that scientific information is inherently quantal in its communi- 
cable aspects; and that the various uncertainty-relations of physics, 
though arising in different ways, are basically expressions of this one fact. 
Analogies in physics emerge as identities between basic structures, and the 
validity of arguments from analogy is easily assessed. It is to be expected 
that problems analysed in these terms would find an answer with the 
minimum of logical manipulation. 

The concepts here introduced find ready application in the experimental 
field. Their fruitfulness in other connections remains to be seen. One 
cannot help feeling, for example, that the dual character of communicable 
information has a direct bearing on the physiology of thought-processes 
and the design of electronic reasoning-instruments. Its metrical aspect 
in particular suggests a possible ‘‘ analogue ’’ mechanism which is not 
at present receiving as much attention as the “ digital ’ one, and which 
it is also hoped to examine in a later paper. 

The philosophical reader will recognize that the discussion has not 
centred on the nature of reality, but on the nature of the propositions 
which the scientific method permits us to make about reality. The 
implications which this “‘ compulsion by the logical form” have for 
fundamental theory cannot be examined here ; but if the present approach 
can help in any way to close familiar loopholes in otherwise attractive 
epistemological arguments, it will repay further study. « 


§ 12. ACKNOWLEDGMENTS. 


This paper is an expanded version of a lecture given to the Maxwell 
Society of King’s College in January 1948. Since then several papers 
have appeared, chiefly in American journals, dealing with communication 
theory in terms of a formalism similar in some respects to that suggested 
here. This has entailed some duplication in §9 of the present paper, but it 
has still seemed worth while to present derivations of the various relations 
in their original form, since the approach is here somewhat different and 
more general. In particular it is hoped that controversy over alleged 
“ refutations ’ of particular definitions (see e.g. Tuller 1949) may be 
allayed by the clarification of the complementary relation between 


y) 


different senses of ‘* Information ”’. 


310 D. M. Mackay on the 


It is a pleasure to express indebtedness to Dr. D. Gabor, both for 
contributions which his work has made to the thought of the present 
paper, and for helpful comments on its presentation. Especial thanks 
are due to the many colleagues of the author whose detailed criticism has 
helped to prevent a number of errors and obscurities. 


§13. GLossary or TERMS USED. 


Bandwidth.—In general terms, the region of Fourier-space to which the. 
output of an instrument is confined. In particular, the effective 
frequency-range (conjugate to a given coordinate) to which it 
responds (§ 5(c)). 

Informatiox.—That which alters the total statement representing “all 
that is the case ” (§2). 

Information-space.—The space in which independent propositions are 
represented by orthogonal rays, and their metron-contents by the 
squares of distances along these rays (§ 6). 

Information-vector.—The vector whose components in information-space 
are the distances just mentioned (§ 6). 

Logon.—Unit of structural information, specifying form (§ 5(a)). 

Logon-capacity.—Number of logons per unit of coordinate-space (§5(b)). 

Logon-content.—Number of logons in a statement (§ 5(a)). 

Metron.—Unit of metrical information, specifying magnitude (§ 4(b)). 

Metron-capacity (cf. logon-capacity) (§ 7(c)). 

Metron-content (cf. logon-content).—A measure of the support given by 
the data to a statement (§4(d)). 

Natural unit—The coordinate-interval over which one metron is acquired 
(§ 7(0). 

Proper-scale.—A representational scale on which magnitude is represented 
by a number of intervals occupied, though not necessarily linearly 
related to that number (§ 4(c)). 

Scale-unit—The ‘smallest meaningful interval’ on a scale (§4(a) ; 
§§7 and 8). 


APPENDIX. 


The metron-density function specified by a logon. 

Gauss based his derivation of the probability function on three 
assumptions : 

(A) The probability of committing an error z in a magnitude X depends 
only on z. 

(B) All values of X are a priori equally acceptable ; the probability 
is not affected by the value of X. 

(C) The most probable value of a magnitude X of which we possess 
n readings, is the arithmetic mean of these readings. 


Quantal Aspects of Scientific Information 311 


From these it is shown in any text book on probability that 
the probability of committing an error of between z and z+dz is 
h/\/(m) exp(—h?2*)dz, where h is a constant denoting the accuracy of 
the measurement. 

Our problem is to derive a function with analogous properties : 


(4) The metron-density specified by a logon relating to q, depends only 
on the displacement q from q. 


(6) The density-function is not affected by the choice of qQ. 


(c) The function has a single maximum at the centroid of the metron 
distribution, 


This follows from the facts that:. (1) the two possible elementary 
occupance-relations specified by the logon must each have a probability 
of 3. There must therefore be as many metrons in support of occupance 
of the region to the left of gp, as to that of the right of gy. (2) Only one 
singularity can be defined by one logon, so that both asymmetry and 
multiplicity of turning-values is ruled out. (3) The singularity cannot 
be a minimum, since one-half of the total area under the curve (metron- 
content) lies between finite limits defined as + 4q/2. We are thus justified 
in making ‘use of Gauss’s result and deducing that the density function 
is of the form A exp(—/?q?). If we use condition (c) (3), that the inter- 
quartile range is to be 4g, we find from tables that h must be 0-941/4q. 

The Fourier transform of this function is also Gaussian, the indices h 
in the two cases being reciprocal. The frequency spectrum corresponding 
to a single logon has the same shape as the metron-density function, 
and could have been derived in the same way. 


REFERENCES. 


Fisuer, R. A., 1935, The Design of Experiments, p. 188. London: Oliver and 
Boyd. 

Gapor, D., 1946, J. Inst. Elec. Engrs., 93 (III), 429. 

Rosgansky, V., 1946, Introductory Quantum Mechanics, Ch. 1X. New York : 
Prentice-Hall. 

SHannon, C. E., 1948, Bell System Tech. J., 27, 379-423 ; 623-656 ; 1949, Proc. 

3 Toten 37, 10. 

SHEFFER, 1913, Trans. Amer. Math. Soc., 14, 481-488. 

Torman, R. C., 1938, Principles of Statistical Mechanics. Oxford University 
Press. 

TULLER, W. G., 1949, Proc. I. R. H., 37, 468. 

Wiener, N., 1948, Cybernetics. New York: John Wiley and Sons. 

WITTGENSTEIN, 1922, Tractatus Logico-Philosophicus. Kegan Paul. 


fw 312s] 


XXV. Notices of New Books and Periodicals received. 


The Strength of Plastics and Glass. By R. N. Hawarp. [Pp. 245.) (London : 
Cleaver-Hume Press Ltd., 1949). Price 30s. net. 


THE scope of this monograph is satisfactorily indicated by its title. The 
emphasis is mainly on experimental results : indeed in the present state of the 
theory this could hardly be otherwise. It is primarily as a source of factual 
information that the book will be valuable, with its fifty tables of data, hundred 
diagrams, and nearly four hundred references to recent literature. Those not 
primarily concerned with the technology of the subject will find it a compre- 
hensive summary of the present state of knowledge in a rapidly developing 
field of investigation. N. T. 


Colloid Science, edited by H. R. Kruyt. Volume 2. Reversible Systems. 
(Elsevier Publishing Company, Ltd. 1949). [Pp. 753.) Price £4 10s. 


Durie the past three decades Professor Kruyt and his school at Utrecht have 
made contributions of the greatest importance to Colloid Science. These 
contributions have ranged over the whole field of stability of lyophiles, lyo- 
phobes, and emulsions, and it is therefore with great pleasure that we welcome 
this, the first to appear of two large volumes covering many of those topics 
with which the Dutch school has been concerned. Professor Kruyt is careful 
to point out in his preface that the book is not comprehensive, but the wide 
range of topics and the authoritative nature of the treatments by the various 
expert contributors will make this book essential for all chemistry libraries, 
and many chemists will want to possess their own copy, in spite of the very 
high price. 

There are seven contributors to this volume. Bungenburg de Jong writes 
the first chapter, which deals with definitions and general matters like the 
applicability of the Phase Rule to colloid systems. Modern work on polymers 
has necessitated a review of our use of many classical terms. De Jong has six 
more chapters dealing with macromolecular electrolytes, coacervates and 
complex colloids, thus covering a whole range of water-soluble systems. While 
the treatment of this section is perhaps rather too extensive in comparison 
with the others, an accessible account in English has been needed for some 
time. Also, water-soluble polymers have usually received scant treatment 
in books on polymer chemistry. Houwink and J. J. and P. H. Hermans are 
responsible for a very well-documented treatment of non-electrolytic polymers. 
A strange absentee from the section on osmotic pressure is the well-known 
apparatus of Fuoss and Mead, which, in fact, is much used by polymer chemists. 
However, the treatment is otherwise very complete; particularly on the 
theoretical side. A good account is given of the fundamental theoretical 
work of Huggins and Flory, and the key position occupied by the experimental 
work of Gee and Treloar in this field is clearly to be seen. 

The book has been very well translated from the Dutch by Dr. L. C. Jackson. 
It should make a useful companion to the recent volumes under the same title 
by A. E. Alexander and P. Johnson, to which it is complementary in many 
respects. The type, illustrations and binding are excellent. We may expect 
that this timely and thorough account of certain aspects of Colloid Science, 
will greatly stimulate research in the subject. Deis 


[The Editors do not hold themselves responsible for the views 
expressed by thear correspondents. | 


